diff options
59 files changed, 2574 insertions, 1259 deletions
@@ -3,12 +3,21 @@ # 4get 4get is a metasearch engine that doesn't suck (they live in our walls!) -## About 4get +# About 4get https://4get.ca/about -## Try it out +# Try it out https://4get.ca +# Totally unbiased comparison between alternatives + +| | 4get | searx(ng) | librex | araa | +|----------------------------|-------------------------|-----------|-------------|----------| +| RAM usage | 200-400mb~ | 2GB~ | 200-400mb~ | 2GB~ | +| Does it suck | no (debunked by snopes) | yes | yes | a little | +| Does it work | ye | no | no | ye | +| Did the dev commit suicide | not until my 30s | idk | yes | no | + ## Supported websites 1. Web - DuckDuckGo @@ -36,7 +45,6 @@ https://4get.ca 4. News - DuckDuckGo - Brave - - Google - Mojeek 5. Music @@ -55,15 +63,15 @@ https://4get.ca More scrapers are coming soon. I currently want to add Google web/video/news search, HackerNews (durr orange site!!) and Qwant. A shopping and files tab is also in my todo list. -# Setup +# Installation This section is still to-do. You will need to figure shit out for some of the apache2 and nginx stuff. Everything else should be OK. -## Apache +## Install on Apache Login as root. ```sh -apt install apache2 certbot php-dom php-imagick imagemagick php-curl curl php-apcu git libapache2-mod-php python3-certbot-apache +apt install apache2 certbot php-imagick imagemagick php-curl curl php-apcu git libapache2-mod-php python3-certbot-apache service apache2 start a2enmod rewrite ``` @@ -90,7 +98,7 @@ chmod 777 -R icons/ Restart the service for good measure... `service apache2 restart` -## NGINX +## Install on NGINX Login as root. @@ -138,10 +146,54 @@ ln -s /etc/nginx/sites-available/4get.conf /etc/nginx/sites-available/4get.conf Now test the nginx config with `nginx -t`, if it says that everything is good, restart nginx using `systemctl restart nginx` -## Setup encryption +## Install using Docker (lol u lazy fuck) + +``` +docker run -d -p 80:80 -e FOURGET_SERVER_NAME="4get.ca" -e FOURGET_SERVER_ADMIN_EMAIL="you@example.com" luuul/4get:latest +``` + +...Or with SSL: +``` +docker run -d -p 443:443 -e FOURGET_SERVER_NAME="4get.ca" -e FOURGET_SERVER_ADMIN_EMAIL="you@example.com" -v /etc/letsencrypt/live/domain.tld:/etc/4get/certs luuul/4get:latest +``` + +replace enviroment variables FOURGET_SERVER_NAME and FOURGET_SERVER_ADMIN_EMAIL with relevant values + +if the certificate files are not mounted to /etc/4get/certs the service listens to port 80 +the certificate directory expects files named `cert.pem`, `chain.pem`, `privkey.pem` + +## Install using Docker Compose +copy `docker-compose.yaml` + +create a directory with images named `banners` for example and mount to `/var/www/html/4get/banner` +to serve custom banners + +``` +version: "3.7" + +services: + fourget: + image: luuul/4get:latest + restart: always + environment: + - FOURGET_SERVER_NAME=4get.ca + - FOURGET_SERVER_ADMIN_EMAIL="you@example.com" + + ports: + - "80:80" + - "443:443" + + volumes: + - /etc/letsencrypt/live/domain.tld:/etc/4get/certs + - ./banners:/var/www/html/4get/banner +``` + +Replace relevant values and start with `docker-compose up -d` + +# Encryption setup I'm schizoid (as you should) so I'm gonna setup 4096bit key encryption. To complete this step, you need a domain or subdomain in your possession. Make sure that the DNS shit for your domain has propagated properly before continuing, because certbot is a piece of shit that will error out the ass once you reach 5 attempts under an hour. -### Apache +## Encryption setup on Apache ```sh certbot --apache --rsa-key-size 4096 -d www.yourdomain.com -d yourdomain.com @@ -169,7 +221,7 @@ Restart again service apache2 restart ``` -### NGINX +## Encryption setup on NGINX Generate a certificate for the domain using: @@ -180,15 +232,13 @@ certbot --nginx --key-type ecdsa -d www.yourdomain.com -d yourdomain.com After doing that certbot should deploy the certificate automatically into your 4get nginx config file. It should be ready to use at that point. -## Captcha +# Jesse it is time to configure the server the fucking bots are back -Right now the setup for this shit is absolutely awful. +Wohoo the awful piece of shit setup and fiddling with 3 gazillion files is GONE. All you need to do to configure your shit is to go in `data/config.php` and edit the self-documenting configuration file. You can also specify proxies in `data/proxies/whatever.txt` and captcha images in `data/captcha/category/1.png`... I further explain how to deal with that garbage in the config file I mentionned. -Edit line 190 in `lib/captcha_gen.php` and specify your image sets. You can't disable the captcha right now lol. Just use a previous commit if you want to do that. Call me a shitcoder all you want I've had no energy lately. Images must be stored in `data/captcha`. Create a folder for each category. All files in there should be named from `1.png` to `321839.png`, for example. +# (Optional) Tor setup -## Tor Setup - -1. Install tor. +1. Install `tor`. 2. Open `/etc/tor/torrc` 3. Go to the line that contains `HiddenServiceDir` and `HiddenServicePort` 4. Uncomment those 2 lines and set them like this: @@ -205,7 +255,7 @@ After you get your onion address you will need to configure your Apache or Nginx I don't know to configure this shit on Apache so here is the NGINX one. -### NGINX +## Tor setup on NGINX Open your current 4get NGINX config (that is under `/etc/nginx/sites-available/`) and append this to the end of the file: @@ -240,49 +290,5 @@ server { Obviously replace `<youronionaddress>` by the onion address of `/var/lib/tor/4get/hostname` and then check if the nginx config is valid with `nginx -t` if yes, then restart the nginx service and try opening the onion address into the Tor Browser. You can see a real world example [here](https://git.zzls.xyz/Fijxu/etc-configs/src/branch/selfhost/nginx/sites-available/4get.zzls.xyz.conf) -## Docker Install - - -``` -docker run -d -p 80:80 -e FOURGET_SERVER_NAME="4get.ca" -e FOURGET_SERVER_ADMIN_EMAIL="you@example.com" luuul/4get:latest -``` - -With SSL -``` -docker run -d -p 443:443 -e FOURGET_SERVER_NAME="4get.ca" -e FOURGET_SERVER_ADMIN_EMAIL="you@example.com" -v /etc/letsencrypt/live/domain.tld:/etc/4get/certs luuul/4get:latest -``` - -replace enviroment variables FOURGET_SERVER_NAME and FOURGET_SERVER_ADMIN_EMAIL with relevant values - -if the certificate files are not mounted to /etc/4get/certs the service listens to port 80 -the certificate directory expects files named `cert.pem`, `chain.pem`, `privkey.pem` - -## Docker compose - -copy `docker-compose.yaml` - -create a directory with images named `banners` for example and mount to `/var/www/html/4get/banner` -to serve custom banners - -``` -version: "3.7" - -services: - fourget: - image: luuul/4get:latest - restart: always - environment: - - FOURGET_SERVER_NAME=4get.ca - - FOURGET_SERVER_ADMIN_EMAIL="you@example.com" - - ports: - - "80:80" - - "443:443" - - volumes: - - /etc/letsencrypt/live/domain.tld:/etc/4get/certs - - ./banners:/var/www/html/4get/banner -``` - -Replace relevant values and start with `docker-compose up -d` - +# Contact +shit breaks all the time but I repair it all the time too. Email me here: will<at>lolcat(dot)ca @@ -1,128 +1,23 @@ <?php +include "data/config.php"; include "lib/frontend.php"; $frontend = new frontend(); echo - '<!DOCTYPE html>' . - '<html lang="en">' . - '<head>' . - '<meta http-equiv="Content-Type" content="text/html;charset=utf-8">' . - '<title>About</title>' . - '<link rel="stylesheet" href="/static/style.css">' . - '<meta name="viewport" content="width=device-width,initial-scale=1">' . - '<meta name="robots" content="index,follow">' . - '<link rel="icon" type="image/x-icon" href="/favicon.ico">' . - '<meta name="description" content="4get.ca: About">' . - '<link rel="search" type="application/opensearchdescription+xml" title="4get" href="/opensearch.xml">' . - '</head>' . - '<body class="' . $frontend->getthemeclass(false) . 'about">'; - -include "data/instances.php"; -$compiledinstancelist = ""; -foreach ($instancelist as $instance) -{ - $compiledinstancelist .= "<tr> <td>".$instance["name"]."</td>"; - $compiledinstancelist .= "<td> <a href=\"".$instance["address"]["uri"]."\">".$instance["address"]["displayname"]."</a>"; - foreach ($instance["altaddresses"] as $alt) - { - $compiledinstancelist .= "<a href=\"".$alt["uri"]."\">(".$alt["displayname"].")</a></td>"; - } - $compiledinstancelist .= "</tr>"; -} + $frontend->load( + "header_nofilters.html", + [ + "title" => "About", + "class" => " class=\"about\"" + ] + ); $left = - '<a href="/" class="link">< Go back</a> - - <h1>Set as default search engine</h1> - <a href="#firefox"><h2 id="firefox">On Firefox and other Gecko based browsers</h2></a> - To set this as your default search engine on Firefox, right click the URL bar and select <div class="code-inline">Add "4get"</div>. Then, visit <a href="about:preferences#search" target="_BLANK" class="link">about:preferences#search</a> and select <div class="code-inline">4get</div> in the dropdown menu. - - <a href="#chrome"><h2 id="chrome">On Chromium and Blink based browsers</h2></a> - Click the 3 superpositioned dots at the top right of the screen and click on <div class="code-inline">Settings</div>, then search for <div class="code-inline">default search engine</div>, or visit <a href="chrome://settings/searchEngines">chrome://settings/searchEngines</a>.<br><br> - - Once you\'re there, click the pencil on the last entry under "Search engines" (it\'s probably DuckDuckGo). Once you do that, a popup will appear. Populate it with the following information: - - <table> - <tr> - <td><b>Field</b></td> - <td><b>Value</b></td> - </tr> - <tr> - <td>Search engine</td> - <td>4get</td> - </tr> - <tr> - <td>Shortcut</td> - <td>4get</td> - </tr> - <tr> - <td>URL with %s in place of query</td> - <td>https://4get.ca/web?s=%s</td> - </tr> - </table> - - Once that\'s done, click <div class="code-inline">Save</div>. Then, on the right handside of the newly created entry, open the dropdown menu and select <div class="code-inline">Make default</div>. - - <h1>Frequently asked questions</h1> - <a href="#what-is-this"><h2 id="what-is-this">What is this?</h2></a> - This is a metasearch engine that gets results from other engines, and strips away all of the tracking parameters and Microsoft/globohomo bullshit they add. Most of the other alternatives to Google jack themselves off about being ""privacy respecting"" or whatever the fuck but it always turns out to be a total lie, and I just got fed up with their shit honestly. Alternatives like Searx or YaCy all fucking sucks so I made my own thing. - - <a href="#goal"><h2 id="goal">My goal</h2></a> - Provide users with a privacy oriented, extremely lightweight, ad free, free as in freedom (and free beer!) way to search for documents around the internet, with minimal, optional javascript code. My long term goal would be to build my own index (that doesn\'t suck) and provide users with an unbiased search engine, with no political inclinations. - - <a href="#logs"><h2 id="logs">Do you keep logs?</h2></a> - I store data temporarly to get the next page of results. This might include search queries, tokens and other parameters. These parameters are encrypted using <div class="code-inline">aes-256-gcm</div> on the serber, for which I give you a key (also known internally as <div class="code-inline">npt</div> token). When you make a request to get the next page, you supply the token, the data is decrypted and the request is fulfilled. This encrypted data is deleted after 15 minutes, or after it\'s used, whichever comes first.<br><br> - - I <b>don\'t</b> log IP addresses, user agents, or anything else. The <div class="code-inline">npt</div> tokens are the only thing that are stored (in RAM, mind you), temporarly, encrypted. - - <a href="#information-sharing"><h2 id="information-sharing">Do you share information with third parties?</h2></a> - Your search queries and supplied filters are shared with the scraper you chose (so I can get the search results, duh). I don\'t share anything else (that means I don\'t share your IP address, location, or anything of this kind). There is no way that site can know you\'re the one searching for something, <u>unless you send out a search query that de-anonymises you.</u> For example, a search query like "hello my full legal name is jonathan gallindo and i want pictures of cloacas" would definitively blow your cover. 4get doesn\'t contain ads or any third party javascript applets or trackers. I don\'t profile you, and quite frankly, I don\'t give a shit about what you search on there.<br><br> - - TL;DR assume those websites can see what you search for, but can\'t see who you are (unless you\'re really dumb). - - <a href="#hosting"><h2 id="hosting">Where is this website hosted?</h2></a> - This website is hosted on a Contabo shitbox in the United States. - - <a href="#keyboard-shortcuts"><h2 id="keyboard-shortcuts">Keyboard shortcuts?</h2></a> - Use <div class="code-inline">/</div> to focus the search box.<br><br> - - When the image viewer is open, you can use the following keybinds:<br> - <div class="code-inline">Up</div>, <div class="code-inline">Down</div>, <div class="code-inline">Left</div>, <div class="code-inline">Right</div> to rotate the image.<br> - <div class="code-inline">CTRL+Up</div>, <div class="code-inline">CTRL+Down</div>, <div class="code-inline">CTRL+Left</div>, <div class="code-inline">CTRL+Right</div> to mirror the image.<br> - <div class="code-inline">Escape</div> to exit the image viewer. - - <a href="#instances"><h2 id="instances">Instances</h2></a> - 4get is open source, anyone can create their own 4get instance! If you wish to add your website to this list, please <a href="https://lolcat.ca">contact me</a>. - - <table> - <tr> - <td>Name</td> - <td>Address</td> - </tr> - '.$compiledinstancelist.' - </table> - - <a href="#schizo"><h2 id="schizo">How can I trust you?</h2></a> - You just sort of have to take my word for it right now. If you\'d rather trust yourself instead of me (I believe in you!!), all of the code on this website is available trough my <a href="https://git.lolcat.ca/lolcat" class="link">git page</a> for you to host on your own machines. Just a reminder: if you\'re the sole user of your instance, it doesn\'t take immense brain power for Microshit to figure out you basically just switched IP addresses. Invite your friends to use your instance! - - <a href="#donate"><h2 id="donate">Support the project</h2></a> - Donate to me trough ko-fi: <a href="https://ko-fi.com/lolcat" target="BLANK" rel="noreferrer">ko-fi.com/lolcat</a><br> - Please donate I sent myself a donation for testing if it works and it looks fucking dumb. Reasons to donate are listed on there. Thank you! - - <a href="#contact"><h2 id="contact">I want to report abuse or have erotic roleplay trough email</h2></a> - I don\'t know about that second part but if you want to talk to me, just drop me an email...<br><br> - - <b>Message to all DMCA enforcers:</b> I don\'t host any of the content. Everything you see here is <u>proxied</u> trough my shitbox with no moderation. Please reach out to the people hosting the infringing content instead.<br><br> - - <a href="https://lolcat.ca" rel="dofollow" class="link">Click here to contact me!</a><br><br> - - <a href="https://validator.w3.org/nu/?doc=https%3A%2F%2F4get.ca" title="W3 Valid!"> - <img src="/static/icon/w3html.png" alt="Valid W3C HTML 4.01" width="88" height="31"> - </a>'; - -// trim out whitespace -$left = explode("\n", $left); + explode( + "\n", + file_get_contents("template/about.html") + ); $out = ""; diff --git a/ami4get.php b/ami4get.php new file mode 100644 index 0000000..f2d48bf --- /dev/null +++ b/ami4get.php @@ -0,0 +1,27 @@ +<?php + +header("Content-Type: application/json"); +header("Access-Control-Allow-Origin: *"); + +include "data/config.php"; + +$bot_requests = apcu_fetch("captcha"); +$real_requests = apcu_fetch("real_requests"); + +echo json_encode( + [ + "status" => "ok", + "service" => "4get", + "server" => [ + "name" => config::SERVER_NAME, + "description" => config::SERVER_LONG_DESCRIPTION, + "bot_protection" => config::BOT_PROTECTION, + "real_requests" => $real_requests === false ? 0 : $real_requests, + "bot_requests" => $bot_requests === false ? 0 : $bot_requests, + "api_enabled" => config::API_ENABLED, + "alt_addresses" => config::ALT_ADDRESSES, + "version" => config::VERSION + ], + "instances" => config::INSTANCES + ] +); @@ -119,6 +119,11 @@ /_____/_/ /_/\__,_/ .___/\____/_/_/ /_/\__/____/ /_/ ++ /ami4get + Tells you basic information about the 4get instance. CORS requests + are allowed on this endpoint. + + + /api/v1/web + &extendedsearch When using the ddg(DuckDuckGo) scraper, you may make use of the diff --git a/api/v1/ac.php b/api/v1/ac.php index 3ee1481..b1ec7dd 100644 --- a/api/v1/ac.php +++ b/api/v1/ac.php @@ -1,5 +1,6 @@ <?php +include "../../data/config.php"; new autocomplete(); class autocomplete{ @@ -17,7 +18,7 @@ class autocomplete{ "yep" => "https://api.yep.com/ac/?query={searchTerms}", "marginalia" => "https://search.marginalia.nu/suggest/?partial={searchTerms}", "yt" => "https://suggestqueries-clients6.youtube.com/complete/search?client=youtube&q={searchTerms}", - "sc" => "https://api-v2.soundcloud.com/search/queries?q={searchTerms}&client_id=ArYppSEotE3YiXCO4Nsgid2LLqJutiww&limit=10&offset=0&linked_partitioning=1&app_version=1693487844&app_locale=en" + "sc" => "https://api-v2.soundcloud.com/search/queries?q={searchTerms}&client_id=" . config::SC_CLIENT_TOKEN . "&limit=10&offset=0&linked_partitioning=1&app_version=1693487844&app_locale=en" ]; /* @@ -107,7 +108,8 @@ class autocomplete{ [ $_GET["s"], $json - ] + ], + JSON_UNESCAPED_UNICODE | JSON_UNESCAPED_SLASHES ); break; @@ -132,7 +134,8 @@ class autocomplete{ [ $_GET["s"], $json - ] + ], + JSON_UNESCAPED_UNICODE | JSON_UNESCAPED_SLASHES ); break; @@ -150,7 +153,8 @@ class autocomplete{ [ $_GET["s"], $json - ] + ], + JSON_UNESCAPED_UNICODE | JSON_UNESCAPED_SLASHES ); break; @@ -162,7 +166,8 @@ class autocomplete{ [ $_GET["s"], $json[1] // ensure it contains valid key 0 - ] + ], + JSON_UNESCAPED_UNICODE | JSON_UNESCAPED_SLASHES ); break; } @@ -170,45 +175,54 @@ class autocomplete{ private function get($url, $query){ - $curlproc = curl_init(); - - $url = str_replace("{searchTerms}", urlencode($query), $url); - - curl_setopt($curlproc, CURLOPT_URL, $url); - - curl_setopt($curlproc, CURLOPT_ENCODING, ""); // default encoding - curl_setopt($curlproc, CURLOPT_HTTPHEADER, - ["User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/116.0", - "Accept: application/json, text/javascript, */*; q=0.01", - "Accept-Language: en-US,en;q=0.5", - "Accept-Encoding: gzip", - "DNT: 1", - "Connection: keep-alive", - "Sec-Fetch-Dest: empty", - "Sec-Fetch-Mode: cors", - "Sec-Fetch-Site: same-site"] - ); - - curl_setopt($curlproc, CURLOPT_RETURNTRANSFER, true); - curl_setopt($curlproc, CURLOPT_SSL_VERIFYHOST, 2); - curl_setopt($curlproc, CURLOPT_SSL_VERIFYPEER, true); - curl_setopt($curlproc, CURLOPT_CONNECTTIMEOUT, 30); - curl_setopt($curlproc, CURLOPT_TIMEOUT, 30); - - $data = curl_exec($curlproc); + try{ + $curlproc = curl_init(); + + $url = str_replace("{searchTerms}", urlencode($query), $url); + + curl_setopt($curlproc, CURLOPT_URL, $url); + + curl_setopt($curlproc, CURLOPT_ENCODING, ""); // default encoding + curl_setopt($curlproc, CURLOPT_HTTPHEADER, + ["User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/116.0", + "Accept: application/json, text/javascript, */*; q=0.01", + "Accept-Language: en-US,en;q=0.5", + "Accept-Encoding: gzip", + "DNT: 1", + "Connection: keep-alive", + "Sec-Fetch-Dest: empty", + "Sec-Fetch-Mode: cors", + "Sec-Fetch-Site: same-site"] + ); + + curl_setopt($curlproc, CURLOPT_RETURNTRANSFER, true); + curl_setopt($curlproc, CURLOPT_SSL_VERIFYHOST, 2); + curl_setopt($curlproc, CURLOPT_SSL_VERIFYPEER, true); + curl_setopt($curlproc, CURLOPT_CONNECTTIMEOUT, 30); + curl_setopt($curlproc, CURLOPT_TIMEOUT, 30); + + $data = curl_exec($curlproc); + + if(curl_errno($curlproc)){ + + throw new Exception(curl_error($curlproc)); + } + + curl_close($curlproc); + return $data; - if(curl_errno($curlproc)){ + }catch(Exception $error){ - throw new Exception(curl_error($curlproc)); + do404("Curl error: " . $error->getMessage()); } - - curl_close($curlproc); - return $data; } private function do404($error){ - echo json_encode(["error" => $error]); + echo json_encode( + ["error" => $error], + JSON_UNESCAPED_UNICODE | JSON_UNESCAPED_SLASHES + ); die(); } @@ -218,7 +232,8 @@ class autocomplete{ [ $_GET["s"], [] - ] + ], + JSON_UNESCAPED_UNICODE | JSON_UNESCAPED_SLASHES ); die(); } diff --git a/api/v1/images.php b/api/v1/images.php index 34510b4..3072b05 100644 --- a/api/v1/images.php +++ b/api/v1/images.php @@ -1,8 +1,14 @@ <?php +chdir("../../"); header("Content-Type: application/json"); -chdir("../../"); +include "data/config.php"; +if(config::API_ENABLED === false){ + + echo json_encode(["status" => "The server administrator disabled the API!"]); + return; +} include "lib/frontend.php"; $frontend = new frontend(); diff --git a/api/v1/music.php b/api/v1/music.php index 3c30953..409e6f0 100644 --- a/api/v1/music.php +++ b/api/v1/music.php @@ -1,8 +1,14 @@ <?php +chdir("../../"); header("Content-Type: application/json"); -chdir("../../"); +include "data/config.php"; +if(config::API_ENABLED === false){ + + echo json_encode(["status" => "The server administrator disabled the API!"]); + return; +} include "lib/frontend.php"; $frontend = new frontend(); diff --git a/api/v1/news.php b/api/v1/news.php index bd8678f..ddfd72a 100644 --- a/api/v1/news.php +++ b/api/v1/news.php @@ -1,8 +1,14 @@ <?php +chdir("../../"); header("Content-Type: application/json"); -chdir("../../"); +include "data/config.php"; +if(config::API_ENABLED === false){ + + echo json_encode(["status" => "The server administrator disabled the API!"]); + return; +} include "lib/frontend.php"; $frontend = new frontend(); diff --git a/api/v1/videos.php b/api/v1/videos.php index a42b29b..dab29af 100644 --- a/api/v1/videos.php +++ b/api/v1/videos.php @@ -1,8 +1,14 @@ <?php +chdir("../../"); header("Content-Type: application/json"); -chdir("../../"); +include "data/config.php"; +if(config::API_ENABLED === false){ + + echo json_encode(["status" => "The server administrator disabled the API!"]); + return; +} include "lib/frontend.php"; $frontend = new frontend(); diff --git a/api/v1/web.php b/api/v1/web.php index 61bf82a..dc1a7cc 100644 --- a/api/v1/web.php +++ b/api/v1/web.php @@ -1,8 +1,14 @@ <?php +chdir("../../"); header("Content-Type: application/json"); -chdir("../../"); +include "data/config.php"; +if(config::API_ENABLED === false){ + + echo json_encode(["status" => "The server administrator disabled the API!"]); + return; +} include "lib/frontend.php"; $frontend = new frontend(); @@ -21,7 +27,13 @@ new captcha($null, $null, $null, "web", false); $get = $frontend->parsegetfilters($_GET, $filters); -if(!isset($_GET["extendedsearch"])){ +if( + isset($_GET["extendedsearch"]) && + $_GET["extendedsearch"] == "yes" +){ + + $get["extendedsearch"] = "yes"; +}else{ $get["extendedsearch"] = "no"; } @@ -7,6 +7,7 @@ if(!isset($_GET["s"])){ die(); } +include "data/config.php"; include "lib/curlproxy.php"; $proxy = new proxy(); diff --git a/audio_sc.php b/audio_sc.php index 9a227e3..36a6855 100644 --- a/audio_sc.php +++ b/audio_sc.php @@ -1,5 +1,6 @@ <?php +include "data/config.php"; new sc_audio(); class sc_audio{ diff --git a/data/config.php b/data/config.php new file mode 100644 index 0000000..f2ca214 --- /dev/null +++ b/data/config.php @@ -0,0 +1,103 @@ +<?php +class config{ + // Welcome to the 4get configuration file + // When updating your instance, please make sure this file isn't missing + // any parameters. + + // 4get version. Please keep this updated + const VERSION = 5; + + // Will be shown pretty much everywhere. + const SERVER_NAME = "4get"; + + // Will be shown in <meta> tag on home page + const SERVER_SHORT_DESCRIPTION = "They live in our walls!"; + + // Will be shown in server list ping (null for no description) + const SERVER_LONG_DESCRIPTION = null; + + // Add your own themes in "static/themes". Set to "Dark" for default theme. + // Eg. To use "static/themes/Cream.css", specify "Cream". + const DEFAULT_THEME = "Dark"; + + // Enable the API? + const API_ENABLED = true; + + // Bot protection + // 4get.ca has been hit with 250k bot reqs every single day for months + // you probably want to enable this if your instance is public... + // 0 = disabled + // 1 = ask for image captcha (requires image dataset & imagick 6.9.11-60) + // @TODO: 2 = invite only (users needs a pass) + const BOT_PROTECTION = 0; + + // if BOT_PROTECTION is set to 1, specify the available datasets here + // images should be named from 1.png to X.png, and be 100x100 in size + // Eg. data/captcha/birds/1.png up to 2263.png + const CAPTCHA_DATASET = [ + // example: + // ["birds", 2263], + // ["fumo_plushies", 1006], + // ["minecraft", 848] + ]; + + // List of domains that point to your servers. Include your tor/i2p + // addresses here! Must be a valid URL. Won't affect links placed on + // the homepage. + const ALT_ADDRESSES = [ + //"https://4get.alt-tld", + //"http://4getwebfrq5zr4sxugk6htxvawqehxtdgjrbcn2oslllcol2vepa23yd.onion" + ]; + + // Known 4get instances. MUST use the https protocol if your instance uses + // it. Is used to generate a distributed list of instances. + // To appear in the list of an instance, contact the host and if everyone added + // eachother your serber should appear everywhere. + const INSTANCES = [ + "https://4get.ca", + "https://4get.zzls.xyz", + "https://4get.silly.computer", + "https://4g.opnxng.com", + "https://4get.konakona.moe" + ]; + + // Default user agent to use for scraper requests. Sometimes ignored to get specific webpages + // Changing this might break things. + const USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/119.0"; + + // Proxy pool assignments for each scraper + // false = Use server's raw IP + // string = will load a proxy list from data/proxies + // Eg. "onion" will load data/proxies/onion.txt + const PROXY_DDG = false; // duckduckgo + const PROXY_BRAVE = false; + const PROXY_FB = false; // facebook + const PROXY_GOOGLE = false; + const PROXY_MARGINALIA = false; + const PROXY_MOJEEK = false; + const PROXY_SC = false; // soundcloud + const PROXY_WIBY = false; + const PROXY_YT = false; // youtube + const PROXY_YEP = false; + const PROXY_PINTEREST = false; + const PROXY_FTM = false; // findthatmeme + const PROXY_IMGUR = false; + const PROXY_YANDEX_W = false; // yandex web + const PROXY_YANDEX_I = false; // yandex images + const PROXY_YANDEX_V = false; // yandex videos + + // + // Scraper-specific parameters + // + + // SOUNDCLOUD + // Get these parameters by making a search on soundcloud with network + // tab open, then filter URLs using "search?q=". (No need to login) + const SC_USER_ID = "143860-454480-469473-289775"; + const SC_CLIENT_TOKEN = "qwfvRfz8PCoa2NldZALK7hhZFIH24Wyx"; + + // MARGINALIA + // Get an API key by contacting the Marginalia.nu maintainer. The "public" key + // works but is almost always rate-limited. + const MARGINALIA_API_KEY = "public"; +} diff --git a/data/instances.php b/data/instances.php deleted file mode 100644 index d7c26e0..0000000 --- a/data/instances.php +++ /dev/null @@ -1,62 +0,0 @@ -<?php - -// this file exists to separate instance data from the actual about page -// HTML, and to make it easier to add/modify instances cleanly. - -$instancelist = [ - [ - "name" => "lolcat's instance (master)", - "address" => [ - "uri" => "https://4get.ca/", - "displayname" => "4get.ca" - ], - "altaddresses" => [ - [ - // all these address blocks will be linked in parentheses - // e.g. 4get.ca (tor) (i2p) etc. - "uri" => "http://4getwebfrq5zr4sxugk6htxvawqehxtdgjrbcn2oslllcol2vepa23yd.onion", - "displayname" => "tor" - ] - ] - ], - [ - "name" => "zzls's Chilean instance", - "address" => [ - "uri" => "https://4get.zzls.xyz/", - "displayname" => "4get.zzls.xyz" - ], - "altaddresses" => [ - [ - "uri" => "http://4get.zzlsghu6mvvwyy75mvga6gaf4znbp3erk5xwfzedb4gg6qqh2j6rlvid.onion", - "displayname" => "tor" - ] - ] - ], - [ - "name" => "zzls's United States instance", - "address" => [ - "uri" => "https://4getus.zzls.xyz/", - "displayname" => "4getus.zzls.xyz" - ], - "altaddresses" => [ - [ - "uri" => "http://4getus.zzlsghu6mvvwyy75mvga6gaf4znbp3erk5xwfzedb4gg6qqh2j6rlvid.onion", - "displayname" => "tor" - ] - ] - ], - [ - "name" => "4get on a silly computer", - "address" => [ - "uri" => "https://4get.silly.computer", - "displayname" => "4get.silly.computer" - ], - "altaddresses" => [ - [ - "uri" => "https://4get.cynic.moe/", - "displayname" => "fallback domain" - ] - ] - ] -] -?> diff --git a/data/proxies/.gitignore b/data/proxies/.gitignore new file mode 100644 index 0000000..70fd2c3 --- /dev/null +++ b/data/proxies/.gitignore @@ -0,0 +1,3 @@ +* +!.gitignore +!onion.txt
\ No newline at end of file diff --git a/data/proxies/onion.txt b/data/proxies/onion.txt new file mode 100644 index 0000000..c9b03f0 --- /dev/null +++ b/data/proxies/onion.txt @@ -0,0 +1,13 @@ +# Specify proxies by following this format:
+# <type>:<address>:<port>:<username>:<password>
+#
+# Examples:
+# https:1.3.3.7:6969:abcd:efg
+# socks4:1.2.3.4:8080::
+# raw_ip::::
+#
+# Available types:
+# raw_ip, http, https, socks4, socks5, socks4a, socks5_hostname
+
+# Local tor proxy
+socks5:localhost:9050::
diff --git a/favicon.php b/favicon.php index dadb923..2a31839 100644 --- a/favicon.php +++ b/favicon.php @@ -6,6 +6,7 @@ if(!isset($_GET["s"])){ die(); } +include "data/config.php"; new favicon($_GET["s"]); class favicon{ @@ -3,6 +3,8 @@ /* Initialize random shit */ +include "data/config.php"; + include "lib/frontend.php"; $frontend = new frontend(); @@ -26,20 +28,7 @@ try{ }catch(Exception $error){ - echo - $frontend->drawerror( - "Shit", - 'This scraper returned an error:' . - '<div class="code">' . htmlspecialchars($error->getMessage()) . '</div>' . - 'Things you can try:' . - '<ul>' . - '<li>Use a different scraper</li>' . - '<li>Remove keywords that could cause errors</li>' . - '<li>Use another 4get instance</li>' . - '</ul><br>' . - 'If the error persists, please <a href="/about">contact the administrator</a>.' - ); - die(); + $frontend->drawscrapererror($error->getMessage(), $get, "images"); } if(count($results["image"]) === 0){ @@ -1,5 +1,6 @@ <?php +include "data/config.php"; include "lib/frontend.php"; $frontend = new frontend(); @@ -8,7 +9,7 @@ $images = glob("banner/*"); echo $frontend->load( "home.html", [ - "body_class" => $frontend->getthemeclass(false), + "server_short_description" => htmlspecialchars(config::SERVER_SHORT_DESCRIPTION), "banner" => $images[rand(0, count($images) - 1)] ] ); diff --git a/instances.php b/instances.php new file mode 100644 index 0000000..b9db771 --- /dev/null +++ b/instances.php @@ -0,0 +1,55 @@ +<?php + +include "lib/frontend.php"; +$frontend = new frontend(); + +include "data/config.php"; + +$params = ""; +$first = true; +foreach($_GET as $key => $value){ + + if( + !is_string($value) || + $key == "target" + ){ + + continue; + } + + if($first === true){ + + $first = false; + $params = "?"; + }else{ + + $params .= "&"; + } + + $params .= urlencode($key) . "=" . urlencode($value); +} + +if( + !isset($_GET["target"]) || + !is_string($_GET["target"]) +){ + + $target = ""; +}else{ + + $target = "/" . urlencode($_GET["target"]); +} + +$instances = ""; +foreach(config::INSTANCES as $instance){ + + $instances .= '<tr><td class="expand"><a href="' . htmlspecialchars($instance) . $target . $params . '" target="_BLANK" rel="noreferer">' . htmlspecialchars($instance) . '</a></td></tr>'; +} + +echo + $frontend->load( + "instances.html", + [ + "instances_html" => $instances + ] + ); diff --git a/lib/backend.php b/lib/backend.php new file mode 100644 index 0000000..209cfec --- /dev/null +++ b/lib/backend.php @@ -0,0 +1,197 @@ +<?php +class backend{ + + public function __construct($scraper){ + + $this->scraper = $scraper; + $this->requestid = apcu_inc("real_requests"); + } + + /* + Proxy stuff + */ + public function get_ip(){ + + $pool = constant("config::PROXY_" . strtoupper($this->scraper)); + if($pool === false){ + + // we don't want a proxy, fuck off! + return 'raw_ip::::'; + } + + // indent + $proxy_index_raw = apcu_inc("p." . $this->scraper); + + $proxylist = file_get_contents("data/proxies/" . $pool . ".txt"); + $proxylist = explode("\n", $proxylist); + + // ignore empty or commented lines + $proxylist = array_filter($proxylist, function($entry){ + $entry = ltrim($entry); + return strlen($entry) > 0 && substr($entry, 0, 1) != "#"; + }); + + $proxylist = array_values($proxylist); + + return $proxylist[$proxy_index_raw % count($proxylist)]; + } + + // this function is also called directly on nextpage + public function assign_proxy(&$curlproc, $ip){ + + // parse proxy line + [ + $type, + $address, + $port, + $username, + $password + ] = explode(":", $ip, 5); + + switch($type){ + + case "raw_ip": + return; + break; + + case "http": + case "https": + curl_setopt($curlproc, CURLOPT_PROXYTYPE, CURLPROXY_HTTP); + curl_setopt($curlproc, CURLOPT_PROXY, $type . "://" . $address . ":" . $port); + break; + + case "socks4": + curl_setopt($curlproc, CURLOPT_PROXYTYPE, CURLPROXY_SOCKS4); + curl_setopt($curlproc, CURLOPT_PROXY, $address . ":" . $port); + break; + + case "socks5": + curl_setopt($curlproc, CURLOPT_PROXYTYPE, CURLPROXY_SOCKS5); + curl_setopt($curlproc, CURLOPT_PROXY, $address . ":" . $port); + break; + + case "socks4a": + curl_setopt($curlproc, CURLOPT_PROXYTYPE, CURLPROXY_SOCKS4A); + curl_setopt($curlproc, CURLOPT_PROXY, $address . ":" . $port); + break; + + case "socks5_hostname": + curl_setopt($curlproc, CURLOPT_PROXYTYPE, CURLPROXY_SOCKS5_HOSTNAME); + curl_setopt($curlproc, CURLOPT_PROXY, $address . ":" . $port); + break; + } + + if($username != ""){ + + curl_setopt($curlproc, CURLOPT_PROXYUSERPWD, $username . ":" . $password); + } + } + + + + /* + Next page stuff + */ + public function store($payload, $page, $proxy){ + + $page = $page[0]; + $password = random_bytes(256); // 2048 bit + $salt = random_bytes(16); + $key = hash_pbkdf2("sha512", $password, $salt, 20000, 32, true); + $iv = + random_bytes( + openssl_cipher_iv_length("aes-256-gcm") + ); + + $tag = ""; + $out = openssl_encrypt($payload, "aes-256-gcm", $key, OPENSSL_RAW_DATA, $iv, $tag, "", 16); + + $key = apcu_inc("key", 1); + + apcu_store( + $page . "." . + $this->scraper . + $this->requestid, + gzdeflate($proxy . "," . $salt.$iv.$out.$tag), + 900 // cache information for 15 minutes blaze it + ); + + return + $this->scraper . $this->requestid . "." . + rtrim(strtr(base64_encode($password), '+/', '-_'), '='); + } + + public function get($npt, $page){ + + $page = $page[0]; + $explode = explode(".", $npt, 2); + + if(count($explode) !== 2){ + + throw new Exception("Malformed nextPageToken!"); + } + + $apcu = $page . "." . $explode[0]; + $key = $explode[1]; + + $payload = apcu_fetch($apcu); + + if($payload === false){ + + throw new Exception("The nextPageToken is invalid or has expired!"); + } + + $key = + base64_decode( + str_pad( + strtr($key, '-_', '+/'), + strlen($key) % 4, + '=', + STR_PAD_RIGHT + ) + ); + + $payload = gzinflate($payload); + + // get proxy + [ + $proxy, + $payload + ] = explode(",", $payload, 2); + + $key = + hash_pbkdf2( + "sha512", + $key, + substr($payload, 0, 16), // salt + 20000, + 32, + true + ); + $ivlen = openssl_cipher_iv_length("aes-256-gcm"); + + $payload = + openssl_decrypt( + substr( + $payload, + 16 + $ivlen, + -16 + ), + "aes-256-gcm", + $key, + OPENSSL_RAW_DATA, + substr($payload, 16, $ivlen), + substr($payload, -16) + ); + + if($payload === false){ + + throw new Exception("The nextPageToken is invalid or has expired!"); + } + + // remove the key after using + apcu_delete($apcu); + + return [$payload, $proxy]; + } +} diff --git a/lib/captcha_gen.php b/lib/captcha_gen.php index 80bc665..6728747 100644 --- a/lib/captcha_gen.php +++ b/lib/captcha_gen.php @@ -4,6 +4,19 @@ class captcha{ public function __construct($frontend, $get, $filters, $page, $output){ + // check if we want captcha + if(config::BOT_PROTECTION !== 1){ + + if($output === true){ + $frontend->loadheader( + $get, + $filters, + $page + ); + } + return; + } + /* Validate cookie, if it exists */ @@ -46,6 +59,7 @@ class captcha{ if($output === false){ + http_response_code(429); // too many reqs echo json_encode([ "status" => "The \"pass\" token in your cookies is missing or has expired!!" ]); @@ -184,15 +198,6 @@ class captcha{ } } - /* - Generate random grid data to pass to captcha.php - */ - $dataset = [ - ["birds", 2263], - ["fumo_plushies", 1006], - ["minecraft", 848] - ]; - // get the positions for the answers // will return between 3 and 6 answer positions $range = range(0, 15); @@ -216,17 +221,18 @@ class captcha{ } // choose a dataset - $choosen = &$dataset[random_int(0, count($dataset) - 1)]; + $c = count(config::CAPTCHA_DATASET); + $choosen = config::CAPTCHA_DATASET[random_int(0, $c - 1)]; $choices = []; - for($i=0; $i<count($dataset); $i++){ + for($i=0; $i<$c; $i++){ - if($dataset[$i][0] == $choosen[0]){ + if(config::CAPTCHA_DATASET[$i][0] == $choosen[0]){ continue; } - $choices[] = $dataset[$i]; + $choices[] = config::CAPTCHA_DATASET[$i]; } // generate grid data diff --git a/lib/curlproxy.php b/lib/curlproxy.php index ef9085b..f1ce2a7 100644 --- a/lib/curlproxy.php +++ b/lib/curlproxy.php @@ -152,7 +152,7 @@ class proxy{ $curl, CURLOPT_HTTPHEADER, [ - "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/116.0", + "User-Agent: " . config::USER_AGENT, "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8", "Accept-Language: en-US,en;q=0.5", "Accept-Encoding: gzip, deflate", @@ -180,7 +180,7 @@ class proxy{ $curl, CURLOPT_HTTPHEADER, [ - "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/116.0", + "User-Agent: " . config::USER_AGENT, "Accept: image/avif,image/webp,*/*", "Accept-Language: en-US,en;q=0.5", "Accept-Encoding: gzip, deflate", @@ -379,7 +379,7 @@ class proxy{ $curl, CURLOPT_HTTPHEADER, [ - "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/116.0", + "User-Agent: " . config::USER_AGENT, "Accept: image/avif,image/webp,*/*", "Accept-Language: en-US,en;q=0.5", "Accept-Encoding: gzip, deflate, br", @@ -395,7 +395,7 @@ class proxy{ $curl, CURLOPT_HTTPHEADER, [ - "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/116.0", + "User-Agent: " . config::USER_AGENT, "Accept: audio/webm,audio/ogg,audio/wav,audio/*;q=0.9,application/ogg;q=0.7,video/*;q=0.6,*/*;q=0.5", "Accept-Language: en-US,en;q=0.5", "Accept-Encoding: gzip, deflate, br", diff --git a/lib/frontend.php b/lib/frontend.php index 97c8c5b..0f9f95d 100644 --- a/lib/frontend.php +++ b/lib/frontend.php @@ -4,6 +4,41 @@ class frontend{ public function load($template, $replacements = []){ + $replacements["server_name"] = htmlspecialchars(config::SERVER_NAME); + $replacements["version"] = config::VERSION; + + if(isset($_COOKIE["theme"])){ + + $theme = str_replace(["/". "."], "", $_COOKIE["theme"]); + + if( + $theme != "Dark" && + !is_file("static/themes/" . $theme . ".css") + ){ + + $theme = config::DEFAULT_THEME; + } + }else{ + + $theme = config::DEFAULT_THEME; + } + + if($theme != "Dark"){ + + $replacements["style"] = '<link rel="stylesheet" href="/static/themes/' . $theme . '.css?v' . config::VERSION . '">'; + }else{ + + $replacements["style"] = ""; + } + + if(isset($_COOKIE["scraper_ac"])){ + + $replacements["ac"] = '?ac=' . htmlspecialchars($_COOKIE["scraper_ac"]); + }else{ + + $replacements["ac"] = ''; + } + $handle = fopen("template/{$template}", "r"); $data = fread($handle, filesize("template/{$template}")); fclose($handle); @@ -29,30 +64,6 @@ class frontend{ return trim($html); } - public function getthemeclass($raw = true){ - - if( - isset($_COOKIE["theme"]) && - $_COOKIE["theme"] == "cream" - ){ - - $body_class = "theme-white "; - }else{ - - $body_class = ""; - } - - if( - $raw && - $body_class != "" - ){ - - return ' class="' . rtrim($body_class) . '"'; - } - - return $body_class; - } - public function loadheader(array $get, array $filters, string $page){ echo @@ -62,8 +73,7 @@ class frontend{ "index" => "no", "search" => htmlspecialchars($get["s"]), "tabs" => $this->generatehtmltabs($page, $get["s"]), - "filters" => $this->generatehtmlfilters($filters, $get), - "body_class" => $this->getthemeclass() + "filters" => $this->generatehtmlfilters($filters, $get) ]); if( @@ -74,18 +84,17 @@ class frontend{ ){ // bot detected !! - echo - $this->drawerror( - "Tshh, blocked!", - 'You were blocked from viewing this page. If you wish to scrape data from 4get, please consider running <a href="https://git.lolcat.ca/lolcat/4get" rel="noreferrer nofollow">your own 4get instance</a> or using <a href="/api.txt">the API</a>.', - ); + $this->drawerror( + "Tshh, blocked!", + 'You were blocked from viewing this page. If you wish to scrape data from 4get, please consider running <a href="https://git.lolcat.ca/lolcat/4get" rel="noreferrer nofollow">your own 4get instance</a> or using <a href="/api.txt">the API</a>.', + ); die(); } } public function drawerror($title, $error){ - return + echo $this->load("search.html", [ "class" => "", "right-left" => "", @@ -96,6 +105,23 @@ class frontend{ $error . '</div>' ]); + die(); + } + + public function drawscrapererror($error, $get, $target){ + + $this->drawerror( + "Shit", + 'This scraper returned an error:' . + '<div class="code">' . htmlspecialchars($error) . '</div>' . + 'Things you can try:' . + '<ul>' . + '<li>Use a different scraper</li>' . + '<li>Remove keywords that could cause errors</li>' . + '<li><a href="/instances?target=' . $target . "&" . $this->buildquery($get, false) . '">Try your search on another 4get instance</a></li>' . + '</ul><br>' . + 'If the error persists, please <a href="/about">contact the administrator</a>.' + ); } public function drawtextresult($site, $greentext = null, $duration = null, $keywords, $tabindex = true, $customhtml = null){ @@ -819,30 +845,7 @@ class frontend{ public function getscraperfilters($page){ - $get_scraper = null; - - switch($page){ - - case "web": - $get_scraper = isset($_COOKIE["scraper_web"]) ? $_COOKIE["scraper_web"] : null; - break; - - case "images": - $get_scraper = isset($_COOKIE["scraper_images"]) ? $_COOKIE["scraper_images"] : null; - break; - - case "videos": - $get_scraper = isset($_COOKIE["scraper_videos"]) ? $_COOKIE["scraper_videos"] : null; - break; - - case "news": - $get_scraper = isset($_COOKIE["scraper_news"]) ? $_COOKIE["scraper_news"] : null; - break; - - case "music": - $get_scraper = isset($_COOKIE["scraper_news"]) ? $_COOKIE["scraper_news"] : null; - break; - } + $get_scraper = isset($_COOKIE["scraper_$page"]) ? $_COOKIE["scraper_$page"] : null; if( isset($_GET["scraper"]) && @@ -1148,32 +1151,8 @@ class frontend{ break; case "_SEARCH": - - // get search string & bang - $sanitized[$parameter] = trim($sanitized[$parameter]); - $sanitized["bang"] = ""; - - if( - strlen($sanitized[$parameter]) !== 0 && - $sanitized[$parameter][0] == "!" - ){ - - $sanitized[$parameter] = explode(" ", $sanitized[$parameter], 2); - - $sanitized["bang"] = trim($sanitized[$parameter][0]); - - if(count($sanitized[$parameter]) === 2){ - - $sanitized[$parameter] = trim($sanitized[$parameter][1]); - }else{ - - $sanitized[$parameter] = ""; - } - - $sanitized["bang"] = ltrim($sanitized["bang"], "!"); - } - - $sanitized[$parameter] = ltrim($sanitized[$parameter], "! \n\r\t\v\x00"); + // get search string + $sanitized["s"] = trim($sanitized[$parameter]); } } } diff --git a/lib/fuckhtml.php b/lib/fuckhtml.php index 5c65417..cb5d38d 100644 --- a/lib/fuckhtml.php +++ b/lib/fuckhtml.php @@ -442,5 +442,3 @@ class fuckhtml{ return json_decode($json_out, true); } } - -?> diff --git a/lib/nextpage.php b/lib/nextpage.php deleted file mode 100644 index 7516667..0000000 --- a/lib/nextpage.php +++ /dev/null @@ -1,106 +0,0 @@ -<?php - -class nextpage{ - - public function __construct($scraper){ - - $this->scraper = $scraper; - } - - public function store($payload, $page){ - - $page = $page[0]; - $password = random_bytes(256); // 2048 bit - $salt = random_bytes(16); - $key = hash_pbkdf2("sha512", $password, $salt, 20000, 32, true); - $iv = - random_bytes( - openssl_cipher_iv_length("aes-256-gcm") - ); - - $tag = ""; - $out = openssl_encrypt($payload, "aes-256-gcm", $key, OPENSSL_RAW_DATA, $iv, $tag, "", 16); - - $key = apcu_inc("key", 1); - - apcu_store( - $page . "." . - $this->scraper . - (string)$key, - gzdeflate($salt.$iv.$out.$tag), - 900 // cache information for 15 minutes blaze it - ); - - return - $this->scraper . $key . "." . - rtrim(strtr(base64_encode($password), '+/', '-_'), '='); - } - - public function get($npt, $page){ - - $page = $page[0]; - $explode = explode(".", $npt, 2); - - if(count($explode) !== 2){ - - throw new Exception("Malformed nextPageToken!"); - } - - $apcu = $page . "." . $explode[0]; - $key = $explode[1]; - - $payload = apcu_fetch($apcu); - - if($payload === false){ - - throw new Exception("The nextPageToken is invalid or has expired!"); - } - - $key = - base64_decode( - str_pad( - strtr($key, '-_', '+/'), - strlen($key) % 4, - '=', - STR_PAD_RIGHT - ) - ); - - $payload = gzinflate($payload); - - $key = - hash_pbkdf2( - "sha512", - $key, - substr($payload, 0, 16), // salt - 20000, - 32, - true - ); - $ivlen = openssl_cipher_iv_length("aes-256-gcm"); - - $payload = - openssl_decrypt( - substr( - $payload, - 16 + $ivlen, - -16 - ), - "aes-256-gcm", - $key, - OPENSSL_RAW_DATA, - substr($payload, 16, $ivlen), - substr($payload, -16) - ); - - if($payload === false){ - - throw new Exception("The nextPageToken is invalid or has expired!"); - } - - // remove the key after using - apcu_delete($apcu); - - return $payload; - } -} @@ -3,6 +3,8 @@ /* Initialize random shit */ +include "data/config.php"; + include "lib/frontend.php"; $frontend = new frontend(); @@ -28,20 +30,7 @@ try{ }catch(Exception $error){ - echo - $frontend->drawerror( - "Shit", - 'This scraper returned an error:' . - '<div class="code">' . htmlspecialchars($error->getMessage()) . '</div>' . - 'Things you can try:' . - '<ul>' . - '<li>Use a different scraper</li>' . - '<li>Remove keywords that could cause errors</li>' . - '<li>Use another 4get instance</li>' . - '</ul><br>' . - 'If the error persists, please <a href="/about">contact the administrator</a>.' - ); - die(); + $frontend->drawscrapererror($error->getMessage(), $get, "music"); } $categories = [ @@ -3,6 +3,8 @@ /* Initialize random shit */ +include "data/config.php"; + include "lib/frontend.php"; $frontend = new frontend(); @@ -28,20 +30,7 @@ try{ }catch(Exception $error){ - echo - $frontend->drawerror( - "Shit", - 'This scraper returned an error:' . - '<div class="code">' . htmlspecialchars($error->getMessage()) . '</div>' . - 'Things you can try:' . - '<ul>' . - '<li>Use a different scraper</li>' . - '<li>Remove keywords that could cause errors</li>' . - '<li>Use another 4get instance</li>' . - '</ul><br>' . - 'If the error persists, please <a href="/about">contact the administrator</a>.' - ); - die(); + $frontend->drawscrapererror($error->getMessage(), $get, "news"); } /* diff --git a/opensearch.php b/opensearch.php new file mode 100644 index 0000000..632a533 --- /dev/null +++ b/opensearch.php @@ -0,0 +1,29 @@ +<?php + +header("Content-Type: application/xml"); +include "data/config.php"; + +$domain = + htmlspecialchars( + (strpos(strtolower($_SERVER['SERVER_PROTOCOL']), 'https') === false ? 'http' : 'https') . + '://' . $_SERVER["HTTP_HOST"] + ); + +echo + '<?xml version="1.0" encoding="UTF-8"?>' . + '<OpenSearchDescription xmlns="http://a9.com/-/spec/opensearch/1.1/">' . + '<ShortName>' . htmlspecialchars(config::SERVER_NAME) . '</ShortName>' . + '<InputEncoding>UTF-8</InputEncoding>' . + '<Image width="16" height="16">' . $domain . '/favicon.ico</Image>' . + '<Url type="text/html" method="GET" template="' . $domain . '/web?s={searchTerms}"/>'; + +if( + isset($_GET["ac"]) && + is_string($_GET["ac"]) && + $_GET["ac"] != "disabled" +){ + + echo '<Url rel="suggestions" type="application/x-suggestions+json" template="' . $domain . '/api/v1/ac?s={searchTerms}&scraper=' . htmlspecialchars($_GET["ac"]) . '"/>'; +} + +echo '</OpenSearchDescription>'; @@ -1,5 +1,6 @@ <?php +include "data/config.php"; include "lib/curlproxy.php"; $proxy = new proxy(); diff --git a/scraper/brave.php b/scraper/brave.php index 93256a8..91e3f9e 100644 --- a/scraper/brave.php +++ b/scraper/brave.php @@ -7,8 +7,8 @@ class brave{ include "lib/fuckhtml.php"; $this->fuckhtml = new fuckhtml(); - include "lib/nextpage.php"; - $this->nextpage = new nextpage("brave"); + include "lib/backend.php"; + $this->backend = new backend("brave"); } public function getfilters($page){ @@ -138,13 +138,20 @@ class brave{ "maybe" => "Maybe", "no" => "No" ] + ], + "spellcheck" => [ + "display" => "Spellcheck", + "option" => [ + "yes" => "Yes", + "no" => "No" + ] ] ]; break; } } - private function get($url, $get = [], $nsfw, $country){ + private function get($proxy, $url, $get = [], $nsfw, $country){ switch($nsfw){ @@ -159,7 +166,7 @@ class brave{ } $headers = [ - "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:107.0) Gecko/20100101 Firefox/110.0", + "User-Agent: " . config::USER_AGENT, "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8", "Accept-Language: en-US,en;q=0.5", "Accept-Encoding: gzip", @@ -190,11 +197,12 @@ class brave{ curl_setopt($curlproc, CURLOPT_SSL_VERIFYPEER, true); curl_setopt($curlproc, CURLOPT_CONNECTTIMEOUT, 30); curl_setopt($curlproc, CURLOPT_TIMEOUT, 30); + + $this->backend->assign_proxy($curlproc, $proxy); $data = curl_exec($curlproc); if(curl_errno($curlproc)){ - throw new Exception(curl_error($curlproc)); } @@ -207,7 +215,9 @@ class brave{ if($get["npt"]){ // get next page data - $q = json_decode($this->nextpage->get($get["npt"], "web"), true); + [$q, $proxy] = $this->backend->get($get["npt"], "web"); + + $q = json_decode($q, true); $search = $q["q"]; $q["spellcheck"] = "0"; @@ -222,7 +232,6 @@ class brave{ // get _GET data instead $search = $get["s"]; - if(strlen($search) === 0){ throw new Exception("Search term is empty!"); @@ -230,9 +239,10 @@ class brave{ if(strlen($search) > 2048){ - throw new Exception("Search query is too long!"); + throw new Exception("Search term is too long!"); } + $proxy = $this->backend->get_ip(); $nsfw = $get["nsfw"]; $country = $get["country"]; $older = $get["older"]; @@ -288,6 +298,7 @@ class brave{ try{ $html = $this->get( + $proxy, "https://search.brave.com/search", $q, $nsfw, @@ -361,9 +372,10 @@ class brave{ $q["country"] = $country; $out["npt"] = - $this->nextpage->store( + $this->backend->store( json_encode($q), - "web" + "web", + $proxy ); } } @@ -759,7 +771,9 @@ class brave{ "description" => isset($result["review"]["description"]) ? $this->limitstrlen( - $result["review"]["description"] + strip_tags( + $result["review"]["description"] + ) ) : $this->titledots( $this->fuckhtml @@ -839,6 +853,32 @@ class brave{ "value" => $this->titledots($info["long_desc"]) ]; } + + // parse ratings + if( + isset($info["ratings"]) && + $info["ratings"] != "void 0" + ){ + + $description[] = [ + "type" => "title", + "value" => "Ratings" + ]; + + foreach($info["ratings"] as $rating){ + + $description[] = [ + "type" => "link", + "url" => $rating["profile"]["url"], + "value" => $rating["profile"]["name"] + ]; + + $description[] = [ + "type" => "text", + "value" => ": " . $rating["ratingValue"] . "/" . $rating["bestRating"] . "\n" + ]; + } + } } $table = []; @@ -908,9 +948,9 @@ class brave{ $out["video"][] = [ "title" => $this->titledots($video["title"]), "description" => $this->titledots($video["description"]), - "date" => isset($video["age"]) ? strtotime($video["age"]) : null, - "duration" => isset($video["video"]["duration"]) ? $this->hms2int($video["video"]["duration"]) : null, - "views" => null, + "date" => isset($video["age"]) && $video["age"] != "void 0" ? strtotime($video["age"]) : null, + "duration" => isset($video["video"]["duration"]) && $video["video"]["duration"] != "void 0" ? $this->hms2int($video["video"]["duration"]) : null, + "views" => isset($video["video"]["views"]) && $video["video"]["views"] != "void 0" ? (int)$video["video"]["views"] : null, "thumb" => isset($video["thumbnail"]["src"]) ? [ @@ -1008,37 +1048,75 @@ class brave{ public function news($get){ - $search = $get["s"]; - if(strlen($search) === 0){ + if($get["npt"]){ - throw new Exception("Search term is empty!"); - } - - $nsfw = $get["nsfw"]; - $country = $get["country"]; - - if(strlen($search) > 2048){ + [$req, $proxy] = $this->backend->get($get["npt"], "news"); - throw new Exception("Search query is too long!"); - } - /* - $handle = fopen("scraper/brave-news.html", "r"); - $html = fread($handle, filesize("scraper/brave-news.html")); - fclose($handle);*/ - try{ - $html = - $this->get( - "https://search.brave.com/news", - [ - "q" => $search - ], - $nsfw, - $country - ); + $req = json_decode($req, true); - }catch(Exception $error){ + $search = $req["q"]; + $country = $req["country"]; + $nsfw = $req["nsfw"]; + $offset = $req["offset"]; + $spellcheck = $req["spellcheck"]; - throw new Exception("Could not fetch search page"); + try{ + $html = + $this->get( + $proxy, + "https://search.brave.com/news", + [ + "q" => $search, + "offset" => $offset, + "spellcheck" => $spellcheck + ], + $nsfw, + $country + ); + + }catch(Exception $error){ + + throw new Exception("Could not fetch search page"); + } + + }else{ + $search = $get["s"]; + if(strlen($search) === 0){ + + throw new Exception("Search term is empty!"); + } + + if(strlen($search) > 2048){ + + throw new Exception("Search term is too long!"); + } + + $proxy = $this->backend->get_ip(); + $nsfw = $get["nsfw"]; + $country = $get["country"]; + $spellcheck = $get["spellcheck"] == "yes" ? "1" : "0"; + + /* + $handle = fopen("scraper/brave-news.html", "r"); + $html = fread($handle, filesize("scraper/brave-news.html")); + fclose($handle);*/ + try{ + $html = + $this->get( + $proxy, + "https://search.brave.com/news", + [ + "q" => $search, + "spellcheck" => $spellcheck + ], + $nsfw, + $country + ); + + }catch(Exception $error){ + + throw new Exception("Could not fetch search page"); + } } $out = [ @@ -1050,6 +1128,17 @@ class brave{ // load html $this->fuckhtml->load($html); + // get npt + $out["npt"] = + $this->generatenextpagetoken( + $search, + $nsfw, + $country, + $spellcheck, + "news", + $proxy + ); + $news = $this->fuckhtml ->getElementsByClassName( @@ -1183,8 +1272,19 @@ class brave{ public function image($get){ $search = $get["s"]; + if(strlen($search) === 0){ + + throw new Exception("Search term is empty!"); + } + + if(strlen($search) > 2048){ + + throw new Exception("Search term is too long!"); + } + $country = $get["country"]; $nsfw = $get["nsfw"]; + $spellcheck = $get["spellcheck"] == "yes" ? "1" : "0"; $out = [ "status" => "ok", @@ -1195,9 +1295,11 @@ class brave{ try{ $html = $this->get( + $this->backend->get_ip(), // no nextpage right now, pass proxy directly "https://search.brave.com/images", [ - "q" => $search + "q" => $search, + "spellcheck" => $spellcheck ], $nsfw, $country @@ -1261,9 +1363,75 @@ class brave{ public function video($get){ - $search = $get["s"]; - $country = $get["country"]; - $nsfw = $get["nsfw"]; + if($get["npt"]){ + + [$npt, $proxy] = $this->backend->get($get["npt"], "videos"); + + $npt = json_decode($npt, true); + $search = $npt["q"]; + $offset = $npt["offset"]; + $spellcheck = $npt["spellcheck"]; + $country = $npt["country"]; + $nsfw = $npt["nsfw"]; + + try{ + $html = + $this->get( + $proxy, + "https://search.brave.com/videos", + [ + "q" => $search, + "offset" => $offset, + "spellcheck" => $spellcheck + ], + $nsfw, + $country + ); + + }catch(Exception $error){ + + throw new Exception("Could not fetch search page"); + } + + }else{ + + $search = $get["s"]; + if(strlen($search) === 0){ + + throw new Exception("Search term is empty!"); + } + + if(strlen($search) > 2048){ + + throw new Exception("Search term is too long!"); + } + + $country = $get["country"]; + $nsfw = $get["nsfw"]; + $spellcheck = $get["spellcheck"] == "yes" ? "1" : "0"; + + $proxy = $this->backend->get_ip(); + + try{ + $html = + $this->get( + $proxy, + "https://search.brave.com/videos", + [ + "q" => $search, + "spellcheck" => $spellcheck + ], + $nsfw, + $country + ); + + }catch(Exception $error){ + + throw new Exception("Could not fetch search page"); + } + } + + $this->fuckhtml->load($html); $out = [ "status" => "ok", @@ -1275,21 +1443,17 @@ class brave{ "reel" => [] ]; - try{ - $html = - $this->get( - "https://search.brave.com/videos", - [ - "q" => $search - ], - $nsfw, - $country - ); - - }catch(Exception $error){ - - throw new Exception("Could not fetch search page"); - } + // get npt + $out["npt"] = + $this->generatenextpagetoken( + $search, + $nsfw, + $country, + $spellcheck, + "videos", + $proxy + ); + /* $handle = fopen("scraper/brave-video.html", "r"); $html = fread($handle, filesize("scraper/brave-video.html")); @@ -1606,7 +1770,7 @@ class brave{ $data["table"][trim($html[0])] = trim($html[1]); } } - + /* private function getimagelinkfromstyle($thumb){ $thumb = @@ -1646,13 +1810,13 @@ class brave{ "url" => $url, "ratio" => "16:9" ]; - } + }*/ private function limitstrlen($text){ return explode("\n", wordwrap($text, 300, "\n"))[0]; } - + /* private function limitwhitespace($text){ return @@ -1661,7 +1825,7 @@ class brave{ " ", $text ); - } + }*/ private function titledots($title){ @@ -1678,6 +1842,52 @@ class brave{ return trim($title); } + private function generatenextpagetoken($q, $nsfw, $country, $spellcheck, $page, $proxy){ + + $nextpage = + $this->fuckhtml + ->getElementsByClassName("btn", "a"); + + if(count($nextpage) !== 0){ + + $nextpage = + $nextpage[count($nextpage) - 1]; + + if( + strtolower( + $this->fuckhtml + ->getTextContent( + $nextpage + ) + ) == "next" + ){ + + preg_match( + '/offset=([0-9]+)/', + $this->fuckhtml->getTextContent($nextpage["attributes"]["href"]), + $nextpage + ); + + return + $this->backend->store( + json_encode( + [ + "q" => $q, + "offset" => (int)$nextpage[1], + "nsfw" => $nsfw, + "country" => $country, + "spellcheck" => $spellcheck + ] + ), + $page, + $proxy + ); + } + } + + return null; + } + private function unshiturl($url){ // https://imgs.search.brave.com/XFnbR8Sl7ge82MBDEH7ju0UHImRovMVmQ2qnDvgNTuA/rs:fit:844:225:1/g:ce/aHR0cHM6Ly90c2U0/Lm1tLmJpbmcubmV0/L3RoP2lkPU9JUC54/UWotQXU5N2ozVndT/RDJnNG9BNVhnSGFF/SyZwaWQ9QXBp.jpeg diff --git a/scraper/ddg.php b/scraper/ddg.php index 1ce8e18..2d737ba 100644 --- a/scraper/ddg.php +++ b/scraper/ddg.php @@ -4,8 +4,11 @@ class ddg{ public function __construct(){ - include "lib/nextpage.php"; - $this->nextpage = new nextpage("ddg"); + include "lib/backend.php"; + $this->backend = new backend("ddg"); + + include "lib/fuckhtml.php"; + $this->fuckhtml = new fuckhtml(); } /* @@ -14,7 +17,7 @@ class ddg{ private const req_web = 0; private const req_xhr = 1; - private function get($url, $get = [], $reqtype = self::req_web){ + private function get($proxy, $url, $get = [], $reqtype = self::req_web){ $curlproc = curl_init(); @@ -28,7 +31,7 @@ class ddg{ switch($reqtype){ case self::req_web: $headers = - ["User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/113.0", + ["User-Agent: " . config::USER_AGENT, "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8", "Accept-Encoding: gzip", "Accept-Language: en-US,en;q=0.5", @@ -43,7 +46,7 @@ class ddg{ case self::req_xhr: $headers = - ["User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/113.0", + ["User-Agent: " . config::USER_AGENT, "Accept: */*", "Accept-Encoding: gzip", "Accept-Language: en-US,en;q=0.5", @@ -57,6 +60,8 @@ class ddg{ break; } + $this->backend->assign_proxy($curlproc, $proxy); + curl_setopt($curlproc, CURLOPT_ENCODING, ""); // default encoding curl_setopt($curlproc, CURLOPT_HTTPHEADER, $headers); @@ -69,7 +74,6 @@ class ddg{ $data = curl_exec($curlproc); if(curl_errno($curlproc)){ - throw new Exception(curl_error($curlproc)); } @@ -541,9 +545,11 @@ class ddg{ public function web($get){ + $proxy = null; + if($get["npt"]){ - $jsgrep = $this->nextpage->get($get["npt"], "web"); + [$jsgrep, $proxy] = $this->backend->get($get["npt"], "web"); $extendedsearch = false; $inithtml = ""; @@ -555,6 +561,7 @@ class ddg{ throw new Exception("Search term is empty!"); } + $proxy = $this->backend->get_ip(); $country = $get["country"]; $nsfw = $get["nsfw"]; $older = $get["older"]; @@ -614,9 +621,9 @@ class ddg{ /* Get html */ - // https://duckduckgo.com/?q=minecraft&kz=1&k1=-1&kp=-2 try{ $inithtml = $this->get( + $proxy, "https://duckduckgo.com/", $get_filters ); @@ -643,6 +650,7 @@ class ddg{ try{ $js = $this->get( + $proxy, "https://links.duckduckgo.com" . $jsgrep, [], ddg::req_xhr @@ -692,6 +700,7 @@ class ddg{ // get definition $wordnikjs = $this->get( + $proxy, "https://duckduckgo.com/js/spice/dictionary/definition/" . $wordnik, [], ddg::req_xhr @@ -725,6 +734,7 @@ class ddg{ $wordnikaudio_json = json_decode( $this->get( + $proxy, "https://duckduckgo.com/js/spice/dictionary/audio/" . $wordnik, [], ddg::req_xhr @@ -922,6 +932,7 @@ class ddg{ try{ $stackjs = $this->get( + $proxy, "https://duckduckgo.com" . $stack, [], ddg::req_xhr @@ -944,7 +955,7 @@ class ddg{ $out["answer"][] = [ "title" => $stackjson["Heading"], - "description" => $this->htmltoarray($stackjson["Abstract"]), + "description" => $this->stackoverflow_parse($stackjson["Abstract"]), "url" => str_replace(["http://", "ddg"], ["https://", ""], $stackjson["AbstractURL"]), "thumb" => null, "table" => [], @@ -973,6 +984,7 @@ class ddg{ try{ $lyricsjs = $this->get( + $proxy, "https://duckduckgo.com" . $lyrics, [], ddg::req_xhr @@ -1166,13 +1178,13 @@ class ddg{ if(isset($answers[$i]["data"]["AbstractText"]) && !empty($answers[$i]["data"]["AbstractText"])){ - $description = $this->htmltoarray($answers[$i]["data"]["AbstractText"]); + $description = $this->stackoverflow_parse($answers[$i]["data"]["AbstractText"]); }elseif(isset($answers[$i]["data"]["Abstract"]) && !empty($answers[$i]["data"]["Abstract"])){ - $description = $this->htmltoarray($answers[$i]["data"]["Abstract"]); + $description = $this->stackoverflow_parse($answers[$i]["data"]["Abstract"]); }elseif(isset($answers[$i]["data"]["Answer"]) && !empty($answers[$i]["data"]["Answer"])){ - $description = $this->htmltoarray($answers[$i]["data"]["Answer"]); + $description = $this->stackoverflow_parse($answers[$i]["data"]["Answer"]); }else{ $description = []; @@ -1310,6 +1322,7 @@ class ddg{ $description = []; $shitcoinjs = $this->get( + $proxy, "https://duckduckgo.com/js/spice/cryptocurrency/{$shitcoins[1]}/{$shitcoins[2]}/1", [], ddg::req_xhr @@ -1408,6 +1421,7 @@ class ddg{ try{ $currencyjs = $this->get( + $proxy, "https://duckduckgo.com/js/spice/currency/{$amount}/" . strtolower($currencies[1]) . "/" . strtolower($currencies[2]), [], ddg::req_xhr @@ -1607,7 +1621,7 @@ class ddg{ // store next page token if(isset($web[$i]["n"])){ - $out["npt"] = $this->nextpage->store($web[$i]["n"] . "&biaexp=b&eslexp=a&litexp=c&msvrtexp=b&wrap=1", "web"); + $out["npt"] = $this->backend->store($web[$i]["n"] . "&biaexp=b&eslexp=a&litexp=c&msvrtexp=b&wrap=1", "web", $proxy); continue; } @@ -1874,10 +1888,11 @@ class ddg{ if($get["npt"]){ - $npt = $this->nextpage->get($get["npt"], "images"); + [$npt, $proxy] = $this->backend->get($get["npt"], "images"); try{ $json = json_decode($this->get( + $proxy, "https://duckduckgo.com/i.js?" . $npt, [], ddg::req_xhr @@ -1895,6 +1910,7 @@ class ddg{ throw new Exception("Search term is empty!"); } + $proxy = $this->backend->get_ip(); $country = $get["country"]; $nsfw = $get["nsfw"]; $date = $get["date"]; @@ -1934,6 +1950,7 @@ class ddg{ try{ $html = $this->get( + $proxy, "https://duckduckgo.com", $get_filters, ddg::req_web @@ -1980,6 +1997,7 @@ class ddg{ try{ $json = json_decode($this->get( + $proxy, "https://duckduckgo.com/i.js", $js_params, ddg::req_xhr @@ -2005,10 +2023,11 @@ class ddg{ } $out["npt"] = - $this->nextpage->store( + $this->backend->store( explode("?", $json["next"])[1] . "&vqd=" . $vqd, - "images" + "images", + $proxy ); } @@ -2046,10 +2065,11 @@ class ddg{ if($get["npt"]){ - $npt = $this->nextpage->get($get["npt"], "videos"); + [$npt, $proxy] = $this->backend->get($get["npt"], "videos"); try{ $json = json_decode($this->get( + $proxy, "https://duckduckgo.com/v.js?" . $npt, [], @@ -2068,6 +2088,7 @@ class ddg{ throw new Exception("Search term is empty!"); } + $proxy = $this->backend->get_ip(); $country = $get["country"]; $nsfw = $get["nsfw"]; $date = $get["date"]; @@ -2099,6 +2120,7 @@ class ddg{ try{ $html = $this->get( + $proxy, "https://duckduckgo.com", $get_filters, ddg::req_web @@ -2123,6 +2145,7 @@ class ddg{ try{ $json = json_decode($this->get( + $proxy, "https://duckduckgo.com/v.js", [ "l" => "us-en", @@ -2155,9 +2178,10 @@ class ddg{ if(isset($json["next"])){ $out["npt"] = - $this->nextpage->store( + $this->backend->store( explode("?", $json["next"])[1], - "videos" + "videos", + $proxy ); } @@ -2213,11 +2237,12 @@ class ddg{ if($get["npt"]){ - $req = $this->nextpage->get($get["npt"], "news"); + [$req, $proxy] = $this->backend->get($get["npt"], "news"); try{ $json = json_decode($this->get( + $proxy, "https://duckduckgo.com/news.js?" . $req, [], @@ -2236,6 +2261,7 @@ class ddg{ throw new Exception("Search term is empty!"); } + $proxy = $this->backend->get_ip(); $country = $get["country"]; $nsfw = $get["nsfw"]; $date = $get["date"]; @@ -2261,6 +2287,7 @@ class ddg{ try{ $html = $this->get( + $proxy, "https://duckduckgo.com", $get_params, ddg::req_web @@ -2303,6 +2330,7 @@ class ddg{ } $json = json_decode($this->get( + $proxy, "https://duckduckgo.com/news.js", $js_params, ddg::req_xhr @@ -2323,9 +2351,10 @@ class ddg{ if(isset($json["next"])){ $out["npt"] = - $this->nextpage->store( + $this->backend->store( explode("?", $json["next"])[1], - "news" + "news", + $proxy ); } @@ -2415,192 +2444,193 @@ class ddg{ return "https://" . $parse["host"] . "/th?id=" . urlencode($parts["id"]); } - private function htmltoarray($html){ + private function appendtext($payload, &$text, &$index){ - $html = strip_tags($html, ["img", "pre", "code", "br", "h1", "h2", "h3", "h4", "h5", "h6", "blockquote", "a"]); - - libxml_use_internal_errors(true); - $dom = new DOMDocument("1.0", "utf-8"); - $dom->loadHTML('<div>' . $html . '</div>'); - $xpath = new DOMXPath($dom); - $descendants = $xpath->query('//div/node()'); - - $images = $xpath->query('//div/node()/img'); - $imageiterator = 0; + if(trim($payload) == ""){ + + return; + } - if(count($descendants) === 0){ + if( + $index !== 0 && + $text[$index - 1]["type"] == "text" + ){ - return [ + $text[$index - 1]["value"] .= preg_replace('/ $/', " ", $payload); + }else{ + + $text[] = [ "type" => "text", - "value" => $this->unescapehtml($html) + "value" => preg_replace('/ $/', " ", $payload) ]; + $index++; } + } + + private function stackoverflow_parse($html){ - $array = []; - $previoustype = null; + $i = 0; + $answer = []; - foreach($descendants as $node){ - - // $node->nodeValue = iconv("UTF-8", "ISO-8859-1//TRANSLIT", $node->nodeValue); + $this->fuckhtml->load($html); + + $tags = $this->fuckhtml->getElementsByTagName("*"); + + if(count($tags) === 0){ - // get node type - switch($node->nodeName){ - case "#text": - $type = "text"; - break; - - case "pre": - $type = "code"; - break; - - case "code": - $type = "inline_code"; - break; - - case "h1": - case "h2": - case "h3": - case "h4": - case "h5": - case "h6": - $type = "title"; - break; - - case "blockquote": - $type = "quote"; - break; - - case "a": - $type = "link"; - break; - - case "img": - $type = "image"; - break; - } + return [ + [ + "type" => "text", + "value" => htmlspecialchars_decode($html) + ] + ]; + } + + foreach($tags as $snippet){ - // add node to array - switch($type){ + switch($snippet["tagName"]){ - case "text": - $value = preg_replace( - '/ {2,}/', - " ", - $this->limitnewlines($this->unescapehtml($node->textContent)) - ); + case "p": + $this->fuckhtml->load($snippet["innerHTML"]); - if( - $previoustype == "quote" || - $previoustype === null || - $previoustype == "image" || - $previoustype == "title" || - $previoustype == "code" - ){ - - $value = ltrim($value); - } + $codetags = + $this->fuckhtml + ->getElementsByTagName("*"); - if($value == ""){ - - $previoustype = $type; - continue 2; - } + $tmphtml = $snippet["innerHTML"]; - // merge with previous text node - if($previoustype == "text"){ + foreach($codetags as $tag){ - $array[count($array) - 1]["value"] = trim($array[count($array) - 1]["value"]) . "\n" . $this->bstoutf8($value); - }else{ + if(!isset($tag["outerHTML"])){ + + continue; + } - $array[] = [ - "type" => "text", - "value" => $this->bstoutf8($value) - ]; + $tmphtml = + explode( + $tag["outerHTML"], + $tmphtml, + 2 + ); + + $value = $this->fuckhtml->getTextContent($tmphtml[0], false, false); + $this->appendtext($value, $answer, $i); + + $type = null; + switch($tag["tagName"]){ + + case "code": $type = "inline_code"; break; + case "em": $type = "italic"; break; + case "blockquote": $type = "quote"; break; + default: $type = "text"; + } + + if($type !== null){ + $value = $this->fuckhtml->getTextContent($tag, false, false); + + if(trim($value) != ""){ + + $answer[] = [ + "type" => $type, + "value" => rtrim($value) + ]; + $i++; + } + } + + if(count($tmphtml) === 2){ + + $tmphtml = $tmphtml[1] . "\n"; + }else{ + + break; + } } - break; - - case "inline_code": - case "bold": - $array[] = [ - "type" => "inline_code", - "value" => $this->bstoutf8(trim($this->limitnewlines($this->unescapehtml($node->textContent)))) - ]; - break; - - case "link": - // check for link nested inside of image - if(strlen($node->childNodes->item(0)->textContent) !== 0){ + if(is_array($tmphtml)){ - $array[] = [ - "type" => "link", - "value" => $this->bstoutf8(trim($this->unescapehtml($node->textContent))), - "url" => $this->bstoutf8(preg_replace('/\/ddg$/', "", preg_replace('/^http:\/\//', "https://", $this->sanitizeurl($node->getAttribute("href"))))) - ]; - break; + $tmphtml = $tmphtml[0]; } - $type = "image"; - - if($previoustype == "text"){ + if(strlen($tmphtml) !== 0){ - $array[count($array) - 1]["value"] = rtrim($array[count($array) - 1]["value"]); + $value = $this->fuckhtml->getTextContent($tmphtml, true, false); + $this->appendtext($value, $answer, $i); } - - $array[] = [ - "type" => "image", - "url" => $this->bstoutf8(preg_replace('/^http:\/\//', "https://", preg_replace('/^\/\/images\.duckduckgo\.com\/iu\/\?u=/', "", $images->item($imageiterator)->getAttribute("src")))) - ]; - - $imageiterator++; - break; - case "image": - - if($previoustype == "text"){ - - $array[count($array) - 1]["value"] = rtrim($array[count($array) - 1]["value"]); - } - - $array[] = [ + case "img": + $answer[] = [ "type" => "image", - "url" => $this->bstoutf8(preg_replace('/^http:\/\//', "https://", preg_replace('/^\/\/images\.duckduckgo\.com\/iu\/\?u=/', "", $node->getAttribute("src")))) + "url" => + $this->fuckhtml + ->getTextContent( + $tag["attributes"]["src"] + ) ]; + $i++; break; - case "quote": - case "title": - case "code": - if($previoustype == "text"){ + case "pre": + switch($answer[$i - 1]["type"]){ - $array[count($array) - 1]["value"] = rtrim($array[count($array) - 1]["value"]); + case "text": + case "italic": + $answer[$i - 1]["value"] = rtrim($answer[$i - 1]["value"]); + break; } - // no break - - default: - $value = trim($this->limitnewlines($this->unescapehtml($node->textContent))); - if($type != "code"){ - - $value = preg_replace( - '/ {2,}/', - " ", - $value + $answer[] = + [ + "type" => "code", + "value" => + rtrim( + $this->fuckhtml + ->getTextContent( + $snippet, + true, + false + ) + ) + ]; + $i++; + + break; + + case "ol": + $o = 0; + + $this->fuckhtml->load($snippet); + $li = + $this->fuckhtml + ->getElementsByTagName("li"); + + foreach($li as $elem){ + $o++; + + $this->appendtext( + $o . ". " . + $this->fuckhtml + ->getTextContent( + $elem + ), + $answer, + $i ); } - - $array[] = [ - "type" => $type, - "value" => $this->bstoutf8($value) - ]; break; } + } + + if( + $i !== 0 && + $answer[$i - 1]["type"] == "text" + ){ - $previoustype = $type; + $answer[$i - 1]["value"] = rtrim($answer[$i - 1]["value"]); } - return $array; + return $answer; } private function bstoutf8($bs){ diff --git a/scraper/facebook.php b/scraper/facebook.php index 7bd576b..395a863 100644 --- a/scraper/facebook.php +++ b/scraper/facebook.php @@ -9,6 +9,9 @@ class facebook{ include "lib/nextpage.php"; $this->nextpage = new nextpage("fb"); + + include "lib/proxy_pool.php"; + $this->proxy = new proxy_pool("facebook"); } public function getfilters($page){ @@ -104,6 +107,8 @@ class facebook{ curl_setopt($curlproc, CURLOPT_SSL_VERIFYPEER, true); curl_setopt($curlproc, CURLOPT_CONNECTTIMEOUT, 30); curl_setopt($curlproc, CURLOPT_TIMEOUT, 30); + + $this->proxy->assign_proxy($curlproc); $data = curl_exec($curlproc); diff --git a/scraper/ftm.php b/scraper/ftm.php index af39c12..0cdfbb3 100644 --- a/scraper/ftm.php +++ b/scraper/ftm.php @@ -4,8 +4,8 @@ class ftm{ public function __construct(){ - include "lib/nextpage.php"; - $this->nextpage = new nextpage("ftm"); + include "lib/backend.php"; + $this->backend = new backend("ftm"); } public function getfilters($page){ @@ -13,7 +13,7 @@ class ftm{ return []; } - private function get($url, $search, $offset){ + private function get($proxy, $url, $search, $offset){ $curlproc = curl_init(); @@ -29,7 +29,7 @@ class ftm{ curl_setopt($curlproc, CURLOPT_ENCODING, ""); // default encoding curl_setopt($curlproc, CURLOPT_HTTPHEADER, - ["User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:107.0) Gecko/20100101 Firefox/110.0", + ["User-Agent: " . config::USER_AGENT, "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8", "Accept-Language: en-US,en;q=0.5", "Accept-Encoding: gzip", @@ -56,6 +56,8 @@ class ftm{ curl_setopt($curlproc, CURLOPT_SSL_VERIFYPEER, true); curl_setopt($curlproc, CURLOPT_CONNECTTIMEOUT, 30); curl_setopt($curlproc, CURLOPT_TIMEOUT, 30); + + $this->backend->assign_proxy($curlproc, $proxy); $data = curl_exec($curlproc); @@ -70,8 +72,6 @@ class ftm{ public function image($get){ - $search = $get["s"]; - $out = [ "status" => "ok", "npt" => null, @@ -80,16 +80,28 @@ class ftm{ if($get["npt"]){ - $count = (int)$this->nextpage->get($get["npt"], "images"); + [$data, $proxy] = $this->backend->get($get["npt"], "images"); + $data = json_decode($data, true); + + $count = $data["count"]; + $search = $data["search"]; }else{ + $search = $get["s"]; + if(strlen($search) === 0){ + + throw new Exception("Search term is empty!"); + } + $count = 0; + $proxy = $this->backend->get_ip(); } try{ $json = json_decode( $this->get( + $proxy, "https://findthatmeme.com/api/v1/search", $search, $count @@ -134,14 +146,15 @@ class ftm{ ]; } - if($count === 50){ - - $out["npt"] = - $this->nextpage->store( - $count, - "images" - ); - } + $out["npt"] = + $this->backend->store( + json_encode([ + "count" => $count, + "search" => $search + ]), + "images", + $proxy + ); return $out; } diff --git a/scraper/google.php b/scraper/google.php index ca77231..055d12a 100644 --- a/scraper/google.php +++ b/scraper/google.php @@ -10,8 +10,8 @@ class google{ include "lib/fuckhtml.php"; $this->fuckhtml = new fuckhtml(); - include "lib/nextpage.php"; - $this->nextpage = new nextpage("google"); + include "lib/backend.php"; + $this->backend = new backend("google"); } public function getfilters($page){ @@ -727,7 +727,7 @@ class google{ } } - private function get($url, $get = []){ + private function get($proxy, $url, $get = []){ $headers = [ "User-Agent: Mozilla/5.0 (Linux; U; Android 2.3.3; pt-pt; LG-P500h-parrot Build/GRI40) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1 MMS/LG-Android-MMS-V1.0/1.2", @@ -760,6 +760,8 @@ class google{ curl_setopt($curlproc, CURLOPT_SSL_VERIFYPEER, true); curl_setopt($curlproc, CURLOPT_CONNECTTIMEOUT, 30); curl_setopt($curlproc, CURLOPT_TIMEOUT, 30); + + $this->backend->assign_proxy($curlproc, $proxy); $data = curl_exec($curlproc); @@ -771,7 +773,7 @@ class google{ curl_close($curlproc); return $data; } - + /* public function web($get){ $search = $get["s"]; @@ -877,9 +879,9 @@ class google{ if(count($title) !== 0){ - /* - Container is a web link - */ + // + // Container is a web link + // $web = [ "title" => $this->titledots( @@ -1051,9 +1053,9 @@ class google{ continue; } - /* - Parse rating object - */ + // + // Parse rating object + // if($is_rating >= -1){ @@ -1102,9 +1104,9 @@ class google{ continue; } - /* - Parse standalone text - */ + // + // Parse standalone text + // $additional_info[] = $innertext; } } @@ -1194,9 +1196,9 @@ class google{ $container_title == "people also search for" ){ - /* - Parse related searches - */ + // + // Parse related searches + // $as = $this->fuckhtml ->getElementsByTagName("a"); @@ -1212,9 +1214,9 @@ class google{ continue; } - /* - Parse image carousel - */ + // + // Parse image carousel + // $title_container = $this->fuckhtml ->getElementsByClassName( @@ -1239,9 +1241,9 @@ class google{ if($title_container == "imagesview all"){ - /* - Image carousel - */ + // + // Image carousel + // $pcitem = $this->fuckhtml ->getElementsByClassName( @@ -1316,9 +1318,9 @@ class google{ } } - /* - Get next page - */ + // + // Get next page + // $as = $this->fuckhtml ->getElementsByTagName("a"); @@ -1340,7 +1342,7 @@ class google{ } return $out; - } + }*/ public function image($get){ @@ -1348,17 +1350,22 @@ class google{ // generate parameters if($get["npt"]){ - $params = - json_decode( - $this->nextpage->get( - $get["npt"], - "images" - ), - true + [$params, $proxy] = + $this->backend->get( + $get["npt"], + "images" ); + + $params = json_decode($params, true); }else{ $search = $get["s"]; + if(strlen($search) === 0){ + + throw new Exception("Search term is empty!"); + } + + $proxy = $this->backend->get_ip(); $country = $get["country"]; $nsfw = $get["nsfw"]; $lang = $get["lang"]; @@ -1475,6 +1482,7 @@ class google{ try{ $html = $this->get( + $proxy, "https://www.google.com/search", $params ); @@ -1578,9 +1586,10 @@ class google{ $params["ijn"] = (int)$params["ijn"] + 1; $out["npt"] = - $this->nextpage->store( + $this->backend->store( json_encode($params), - "images" + "images", + $proxy ); }else{ @@ -1628,9 +1637,10 @@ class google{ $params["imgvl"] = $imgvl; $out["npt"] = - $this->nextpage->store( + $this->backend->store( json_encode($params), - "images" + "images", + $proxy ); } } diff --git a/scraper/imgur.php b/scraper/imgur.php index 4a16de7..23efe00 100644 --- a/scraper/imgur.php +++ b/scraper/imgur.php @@ -4,11 +4,11 @@ class imgur{ public function __construct(){ - include "lib/nextpage.php"; - $this->nextpage = new nextpage("imgur"); - include "lib/fuckhtml.php"; $this->fuckhtml = new fuckhtml(); + + include "lib/backend.php"; + $this->backend = new backend("imgur"); } public function getfilters($page){ @@ -57,7 +57,7 @@ class imgur{ ]; } - private function get($url, $get = []){ + private function get($proxy, $url, $get = []){ $curlproc = curl_init(); @@ -70,7 +70,7 @@ class imgur{ curl_setopt($curlproc, CURLOPT_ENCODING, ""); // default encoding curl_setopt($curlproc, CURLOPT_HTTPHEADER, - ["User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:107.0) Gecko/20100101 Firefox/110.0", + ["User-Agent: " . config::USER_AGENT, "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8", "Accept-Language: en-US,en;q=0.5", "Accept-Encoding: gzip", @@ -89,6 +89,8 @@ class imgur{ curl_setopt($curlproc, CURLOPT_SSL_VERIFYPEER, true); curl_setopt($curlproc, CURLOPT_CONNECTTIMEOUT, 30); curl_setopt($curlproc, CURLOPT_TIMEOUT, 30); + + $this->backend->assign_proxy($curlproc, $proxy); $data = curl_exec($curlproc); @@ -105,15 +107,14 @@ class imgur{ if($get["npt"]){ - $filter = - json_decode( - $this->nextpage->get( - $get["npt"], - "images" - ), - true + [$filter, $proxy] = + $this->backend->get( + $get["npt"], + "images" ); + $filter = json_decode($filter, true); + $search = $filter["s"]; unset($filter["s"]); @@ -134,6 +135,12 @@ class imgur{ }else{ $search = $get["s"]; + if(strlen($search) === 0){ + + throw new Exception("Search term is empty!"); + } + + $proxy = $this->backend->get_ip(); $sort = $get["sort"]; $time = $get["time"]; $format = $get["format"]; @@ -165,6 +172,7 @@ class imgur{ try{ $html = $this->get( + $proxy, "https://imgur.com/search/$sort/$time/page/$page", $filter ); @@ -238,9 +246,10 @@ class imgur{ $filter["page"] = $page + 1; $out["npt"] = - $this->nextpage->store( + $this->backend->store( json_encode($filter), - "images" + "images", + $proxy ); } diff --git a/scraper/marginalia.php b/scraper/marginalia.php index c8ab09f..b790a97 100644 --- a/scraper/marginalia.php +++ b/scraper/marginalia.php @@ -3,7 +3,8 @@ class marginalia{ public function __construct(){ - $this->key = "public"; + include "lib/backend.php"; + $this->backend = new backend("marginalia"); } public function getfilters($page){ @@ -76,10 +77,10 @@ class marginalia{ } } - private function get($url, $get = []){ + private function get($proxy, $url, $get = []){ $headers = [ - "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:107.0) Gecko/20100101 Firefox/110.0", + "User-Agent: " . config::USER_AGENT, "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8", "Accept-Language: en-US,en;q=0.5", "Accept-Encoding: gzip", @@ -109,6 +110,8 @@ class marginalia{ curl_setopt($curlproc, CURLOPT_SSL_VERIFYPEER, true); curl_setopt($curlproc, CURLOPT_CONNECTTIMEOUT, 30); curl_setopt($curlproc, CURLOPT_TIMEOUT, 30); + + $this->backend->assign_proxy($curlproc, $proxy); $data = curl_exec($curlproc); @@ -124,6 +127,11 @@ class marginalia{ public function web($get){ $search = [$get["s"]]; + if(strlen($get["s"]) === 0){ + + throw new Exception("Search term is empty!"); + } + $profile = $get["profile"]; $format = $get["format"]; $file = $get["file"]; @@ -184,7 +192,8 @@ class marginalia{ try{ $json = $this->get( - "https://api.marginalia.nu/{$this->key}/search/" . urlencode($search), + $this->backend->get_ip(), // no nextpage + "https://api.marginalia.nu/" . config::MARGINALIA_API_KEY . "/search/" . urlencode($search), $params ); }catch(Exception $error){ diff --git a/scraper/mojeek.php b/scraper/mojeek.php index e7e8abc..3d91c09 100644 --- a/scraper/mojeek.php +++ b/scraper/mojeek.php @@ -6,8 +6,8 @@ class mojeek{ include "lib/fuckhtml.php"; $this->fuckhtml = new fuckhtml(); - include "lib/nextpage.php"; - $this->nextpage = new nextpage("mojeek"); + include "lib/backend.php"; + $this->backend = new backend("mojeek"); } public function getfilters($page){ @@ -371,10 +371,10 @@ class mojeek{ } } - private function get($url, $get = []){ + private function get($proxy, $url, $get = []){ $headers = [ - "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:107.0) Gecko/20100101 Firefox/110.0", + "User-Agent: " . config::USER_AGENT, "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8", "Accept-Language: en-US,en;q=0.5", "Accept-Encoding: gzip", @@ -404,6 +404,8 @@ class mojeek{ curl_setopt($curlproc, CURLOPT_SSL_VERIFYPEER, true); curl_setopt($curlproc, CURLOPT_CONNECTTIMEOUT, 30); curl_setopt($curlproc, CURLOPT_TIMEOUT, 30); + + $this->backend->assign_proxy($curlproc, $proxy); $data = curl_exec($curlproc); @@ -420,11 +422,12 @@ class mojeek{ if($get["npt"]){ - $token = $this->nextpage->get($get["npt"], "web"); + [$token, $proxy] = $this->backend->get($get["npt"], "web"); try{ $html = $this->get( + $proxy, "https://www.mojeek.com" . $token, [] ); @@ -485,9 +488,12 @@ class mojeek{ $params["si"] = $domain; } + $proxy = $this->backend->get_ip(); + try{ $html = $this->get( + $proxy, "https://www.mojeek.com/search", $params ); @@ -529,88 +535,90 @@ class mojeek{ return $out; } - $this->fuckhtml->load($results[0]); - /* - Get search results + Get all search result divs */ - $results = - $this->fuckhtml - ->getElementsByTagName("li"); - - foreach($results as $result){ - - $data = [ - "title" => null, - "description" => null, - "url" => null, - "date" => null, - "type" => "web", - "thumb" => [ - "url" => null, - "ratio" => null - ], - "sublink" => [], - "table" => [] - ]; - - $this->fuckhtml->load($result); + foreach($results as $container){ - $title = + $this->fuckhtml->load($container); + $results = $this->fuckhtml - ->getElementsByClassName("title", "a")[0]; + ->getElementsByTagName("li"); - $data["title"] = - html_entity_decode( + foreach($results as $result){ + + $data = [ + "title" => null, + "description" => null, + "url" => null, + "date" => null, + "type" => "web", + "thumb" => [ + "url" => null, + "ratio" => null + ], + "sublink" => [], + "table" => [] + ]; + + $this->fuckhtml->load($result); + + $title = $this->fuckhtml - ->getTextContent( - $title["innerHTML"] - ) - ); - - $data["url"] = - html_entity_decode( + ->getElementsByClassName("title", "a")[0]; + + $data["title"] = + html_entity_decode( + $this->fuckhtml + ->getTextContent( + $title["innerHTML"] + ) + ); + + $data["url"] = + html_entity_decode( + $this->fuckhtml + ->getTextContent( + $title["attributes"]["href"] + ) + ); + + $description = $this->fuckhtml - ->getTextContent( - $title["attributes"]["href"] - ) - ); - - $description = - $this->fuckhtml - ->getElementsByClassName( - "s", "p" - ); - - if(count($description) !== 0){ + ->getElementsByClassName( + "s", "p" + ); - $data["description"] = - $this->titledots( - html_entity_decode( - $this->fuckhtml - ->getTextContent( - $description[0] + if(count($description) !== 0){ + + $data["description"] = + $this->titledots( + html_entity_decode( + $this->fuckhtml + ->getTextContent( + $description[0] + ) ) + ); + } + + $data["date"] = + explode( + " - ", + $this->fuckhtml + ->getTextContent( + $this->fuckhtml + ->getElementsByClassName("i", "p")[1] ) ); + + $data["date"] = + strtotime( + $data["date"][count($data["date"]) - 1] + ); + + $out["web"][] = $data; } - - $data["date"] = - explode( - " - ", - $this->fuckhtml - ->getTextContent( - $this->fuckhtml - ->getElementsByClassName("i", "p")[1] - ) - ); - - $data["date"] = - strtotime( - $data["date"][count($data["date"]) - 1] - ); - - $out["web"][] = $data; } /* @@ -969,12 +977,13 @@ class mojeek{ if($a["innerHTML"] == "Next"){ - $out["npt"] = $this->nextpage->store( + $out["npt"] = $this->backend->store( $this->fuckhtml ->getTextContent( $a["attributes"]["href"] ), - "web" + "web", + $proxy ); } } @@ -1001,6 +1010,7 @@ class mojeek{ try{ $html = $this->get( + $this->backend->get_ip(), "https://www.mojeek.com/search", [ "q" => $search, @@ -1011,168 +1021,139 @@ class mojeek{ throw new Exception("Failed to get HTML"); } - /* $handle = fopen("scraper/mojeek.html", "r"); $html = fread($handle, filesize("scraper/mojeek.html")); - fclose($handle);*/ - - /* - Get big, standard and smaller nodes + fclose($handle); */ - foreach( - [ - "results-extended", - "results-standard" - ] - as $categoryname - ){ + + $this->fuckhtml->load($html); + + $articles = + $this->fuckhtml->getElementsByTagName("article"); + + foreach($articles as $article){ + + $this->fuckhtml->load($article); + + $data = [ + "title" => null, + "author" => null, + "description" => null, + "date" => null, + "thumb" => + [ + "url" => null, + "ratio" => null + ], + "url" => null + ]; + + $a = $this->fuckhtml->getElementsByTagName("a")[0]; + + $data["title"] = + $this->fuckhtml + ->getTextContent( + $a["attributes"]["title"] + ); + + $data["url"] = + $this->fuckhtml + ->getTextContent( + $a["attributes"]["href"] + ); + + $p = $this->fuckhtml->getElementsByTagName("p"); + + $data["description"] = + $this->titledots( + $this->fuckhtml + ->getTextContent( + $this->fuckhtml + ->getElementsByClassName( + "s", + $p + )[0] + ) + ); - $this->fuckhtml->load($html); + if($data["description"] == ""){ + + $data["description"] = null; + } - $categories = + // get date from big node + $date = $this->fuckhtml ->getElementsByClassName( - $categoryname, - "ul" + "date", + $p ); - - foreach($categories as $category){ + + if(count($date) !== 0){ + + $data["date"] = + strtotime( + $this->fuckhtml + ->getTextContent( + $date[0] + ) + ); + } + + // grep date + author + $s = + $this->fuckhtml + ->getElementsByClassName( + "i", + $p + )[0]; + + $this->fuckhtml->load($s); + + $a = + $this->fuckhtml + ->getElementsByTagName("a"); + + if(count($a) !== 0){ - $this->fuckhtml->load($category); + // parse big node information + $data["author"] = + $this->fuckhtml + ->getTextContent( + $a[0]["innerHTML"] + ); + }else{ - $nodes = + // parse smaller nodes + $replace = $this->fuckhtml - ->getElementsByTagName("li"); + ->getElementsByTagName("time")[0]; - foreach($nodes as $node){ - - $data = [ - "title" => null, - "author" => null, - "description" => null, - "date" => null, - "thumb" => - [ - "url" => null, - "ratio" => null - ], - "url" => null - ]; - - /* - Parse the results - */ - $this->fuckhtml->load($node); - - // get title + url - $a = - $this->fuckhtml - ->getElementsByTagName("a")[0]; - - $data["title"] = - $this->fuckhtml - ->getTextContent( - $a["attributes"]["title"] - ); - - $data["url"] = + $data["date"] = + strtotime( $this->fuckhtml ->getTextContent( - $a["attributes"]["href"] - ); - - // get image - $image = - $this->fuckhtml - ->getElementsByTagName("img"); - - if(count($image) !== 0){ - - $data["thumb"] = [ - "url" => - urldecode( - str_replace( - "/image?img=", - "", - $this->fuckhtml - ->getTextContent( - $image[0]["attributes"]["src"] - ) - ) - ), - "ratio" => "16:9" - ]; - } - - // get description - $description = - $this->fuckhtml - ->getElementsByClassName("s", "p"); - - if(count($description) !== 0){ - - $data["description"] = - $this->titledots( - $this->fuckhtml - ->getTextContent( - $description[0] - ) - ); - } - - // get date + time - $date = - $this->fuckhtml - ->getElementsByClassName( - "date", - "p" - ); - - $i = - $this->fuckhtml - ->getElementsByClassName("i", "p"); - - if(count($date) !== 0){ - - // we're inside a big node - $data["date"] = strtotime($date[0]["innerHTML"]); - - if(count($i) !== 0){ - - $this->fuckhtml->load($i[0]); - - $a = - $this->fuckhtml - ->getElementsByTagName("a"); - - if(count($a) !== 0){ - - $data["author"] = - $this->fuckhtml - ->getTextContent($a[0]); - } - } - }else{ - - // we're inside a small node - if(count($i) !== 0){ - - $i = - explode( - " - ", - $this->fuckhtml - ->getTextContent($i[0]) - ); - - $data["date"] = strtotime(array_pop($i)); - $data["author"] = implode(" - ", $i); - } - } - - $out["news"][] = $data; - } + $replace + ) + ); + + $s["innerHTML"] = + str_replace( + $replace["outerHTML"], + "", + $s["innerHTML"] + ); + + $data["author"] = + preg_replace( + '/ • $/', + "", + $s["innerHTML"] + ); } + + $out["news"][] = $data; } return $out; diff --git a/scraper/pinterest.php b/scraper/pinterest.php index 2bb5b71..37473a1 100644 --- a/scraper/pinterest.php +++ b/scraper/pinterest.php @@ -6,6 +6,9 @@ class pinterest{ include "lib/nextpage.php"; $this->nextpage = new nextpage("pinterest"); + + include "lib/proxy_pool.php"; + $this->proxy = new proxy_pool("pinterest"); } public function getfilters($page){ @@ -44,6 +47,8 @@ class pinterest{ curl_setopt($curlproc, CURLOPT_SSL_VERIFYPEER, true); curl_setopt($curlproc, CURLOPT_CONNECTTIMEOUT, 30); curl_setopt($curlproc, CURLOPT_TIMEOUT, 30); + + $this->proxy->assign_proxy($curlproc); $data = curl_exec($curlproc); diff --git a/scraper/sc.php b/scraper/sc.php index 1f49f95..16d3931 100644 --- a/scraper/sc.php +++ b/scraper/sc.php @@ -4,10 +4,8 @@ class sc{ public function __construct(){ - include "lib/nextpage.php"; - $this->nextpage = new nextpage("sc"); - $this->client_id = "ArYppSEotE3YiXCO4Nsgid2LLqJutiww"; - $this->user_id = "766585-580597-163310-929698"; + include "lib/backend.php"; + $this->backend = new backend("sc"); } public function getfilters($page){ @@ -27,7 +25,7 @@ class sc{ ]; } - private function get($url, $get = []){ + private function get($proxy, $url, $get = []){ $curlproc = curl_init(); @@ -40,7 +38,7 @@ class sc{ curl_setopt($curlproc, CURLOPT_ENCODING, ""); // default encoding curl_setopt($curlproc, CURLOPT_HTTPHEADER, - ["User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/116.0", + ["User-Agent: " . config::USER_AGENT, "Accept: application/json, text/javascript, */*; q=0.01", "Accept-Language: en-US,en;q=0.5", "Accept-Encoding: gzip", @@ -58,6 +56,8 @@ class sc{ curl_setopt($curlproc, CURLOPT_SSL_VERIFYPEER, true); curl_setopt($curlproc, CURLOPT_CONNECTTIMEOUT, 30); curl_setopt($curlproc, CURLOPT_TIMEOUT, 30); + + $this->backend->assign_proxy($curlproc, $proxy); $data = curl_exec($curlproc); @@ -74,7 +74,7 @@ class sc{ if($get["npt"]){ - $params = $this->nextpage->get($get["npt"], "music"); + [$params, $proxy] = $this->backend->get($get["npt"], "music"); $params = json_decode($params, true); $url = $params["url"]; @@ -101,7 +101,13 @@ class sc{ // https://api-v2.soundcloud.com/search/playlists_without_albums?q=freddie%20dredd&variant_ids=&facet=genre&user_id=630591-269800-703400-765403&client_id=iMxZgT5mfGstBj8GWJbYMvpzelS8ne0E&limit=20&offset=0&linked_partitioning=1&app_version=1693487844&app_locale=en $search = $get["s"]; + if(strlen($search) === 0){ + + throw new Exception("Search term is empty!"); + } + $type = $get["type"]; + $proxy = $this->backend->get_ip(); switch($type){ @@ -111,8 +117,8 @@ class sc{ "q" => $search, "variant_ids" => "", "facet" => "model", - "user_id" => $this->user_id, - "client_id" => $this->client_id, + "user_id" => config::SC_USER_ID, + "client_id" => config::SC_CLIENT_TOKEN, "limit" => 20, "offset" => 0, "linked_partitioning" => 1, @@ -127,8 +133,8 @@ class sc{ "q" => $search, "variant_ids" => "", "facet_genre" => "", - "user_id" => $this->user_id, - "client_id" => $this->client_id, + "user_id" => config::SC_USER_ID, + "client_id" => config::SC_CLIENT_TOKEN, "limit" => 20, "offset" => 0, "linked_partitioning" => 1, @@ -143,8 +149,8 @@ class sc{ "q" => $search, "variant_ids" => "", "facet" => "place", - "user_id" => $this->user_id, - "client_id" => $this->client_id, + "user_id" => config::SC_USER_ID, + "client_id" => config::SC_CLIENT_TOKEN, "limit" => 20, "offset" => 0, "linked_partitioning" => 1, @@ -159,8 +165,8 @@ class sc{ "q" => $search, "variant_ids" => "", "facet" => "genre", - "user_id" => $this->user_id, - "client_id" => $this->client_id, + "user_id" => config::SC_USER_ID, + "client_id" => config::SC_CLIENT_TOKEN, "limit" => 20, "offset" => 0, "linked_partitioning" => 1, @@ -175,8 +181,8 @@ class sc{ "q" => $search, "variant_ids" => "", "facet" => "genre", - "user_id" => $this->user_id, - "client_id" => $this->client_id, + "user_id" => config::SC_USER_ID, + "client_id" => config::SC_CLIENT_TOKEN, "limit" => 20, "offset" => 0, "linked_partitioning" => 1, @@ -192,8 +198,8 @@ class sc{ "variant_ids" => "", "filter.content_tier" => "SUB_HIGH_TIER", "facet" => "genre", - "user_id" => $this->user_id, - "client_id" => $this->client_id, + "user_id" => config::SC_USER_ID, + "client_id" => config::SC_CLIENT_TOKEN, "limit" => 20, "offset" => 0, "linked_partitioning" => 1, @@ -206,7 +212,7 @@ class sc{ try{ - $json = $this->get($url, $params); + $json = $this->get($proxy, $url, $params); }catch(Exception $error){ @@ -244,9 +250,10 @@ class sc{ $params["url"] = $url; // we will remove this later $out["npt"] = - $this->nextpage->store( + $this->backend->store( json_encode($params), - "music" + "music", + $proxy ); } @@ -342,7 +349,7 @@ class sc{ "endpoint" => "audio_sc", "url" => $item["media"]["transcodings"][0]["url"] . - "?client_id=" . $this->client_id . + "?client_id=" . config::SC_CLIENT_TOKEN . "&track_authorization=" . $item["track_authorization"] ]; diff --git a/scraper/wiby.php b/scraper/wiby.php index a1daf57..e8351bc 100644 --- a/scraper/wiby.php +++ b/scraper/wiby.php @@ -4,8 +4,8 @@ class wiby{ public function __construct(){ - include "lib/nextpage.php"; - $this->nextpage = new nextpage("wiby"); + include "lib/backend.php"; + $this->backend = new backend("wiby"); } public function getfilters($page){ @@ -36,7 +36,7 @@ class wiby{ ]; } - private function get($url, $get = [], $nsfw){ + private function get($proxy, $url, $get = [], $nsfw){ $curlproc = curl_init(); @@ -45,11 +45,13 @@ class wiby{ $url .= "?" . $get; } + print_r([$proxy, $url]); + curl_setopt($curlproc, CURLOPT_URL, $url); curl_setopt($curlproc, CURLOPT_ENCODING, ""); // default encoding curl_setopt($curlproc, CURLOPT_HTTPHEADER, - ["User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:107.0) Gecko/20100101 Firefox/110.0", + ["User-Agent: " . config::USER_AGENT, "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8", "Accept-Language: en-US,en;q=0.5", "Accept-Encoding: gzip", @@ -69,6 +71,8 @@ class wiby{ curl_setopt($curlproc, CURLOPT_CONNECTTIMEOUT, 30); curl_setopt($curlproc, CURLOPT_TIMEOUT, 30); + $this->backend->assign_proxy($curlproc, $proxy); + $data = curl_exec($curlproc); if(curl_errno($curlproc)){ @@ -84,11 +88,8 @@ class wiby{ if($get["npt"]){ - $q = - json_decode( - $this->nextpage->get($get["npt"], "web"), - true - ); + [$q, $proxy] = $this->backend->get($get["npt"], "web"); + $q = json_decode($q, true); $nsfw = $q["nsfw"]; unset($q["nsfw"]); @@ -100,6 +101,7 @@ class wiby{ throw new Exception("Search term is empty!"); } + $proxy = $this->backend->get_ip(); $date = $get["date"]; $nsfw = $get["nsfw"] == "yes" ? "0" : "1"; @@ -150,6 +152,7 @@ class wiby{ try{ $html = $this->get( + $proxy, "https://wiby.me/", $q, $nsfw @@ -171,13 +174,14 @@ class wiby{ }else{ $nextpage = - $this->nextpage->store( + $this->backend->store( json_encode([ "q" => $q["q"], "p" => (int)$nextpage[1], "nsfw" => $nsfw ]), - "web" + "web", + $proxy ); } diff --git a/scraper/yandex.php b/scraper/yandex.php index 65abe73..7335edc 100644 --- a/scraper/yandex.php +++ b/scraper/yandex.php @@ -10,11 +10,11 @@ class yandex{ include "lib/fuckhtml.php"; $this->fuckhtml = new fuckhtml(); - include "lib/nextpage.php"; - $this->nextpage = new nextpage("yandex"); + include "lib/backend.php"; + // backend included in the scraper functions } - private function get($url, $get = [], $nsfw){ + private function get($proxy, $url, $get = [], $nsfw){ $curlproc = curl_init(); @@ -32,7 +32,7 @@ class yandex{ } $headers = - ["User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/113.0", + ["User-Agent: " . config::USER_AGENT, "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8", "Accept-Encoding: gzip", "Accept-Language: en-US,en;q=0.5", @@ -54,6 +54,8 @@ class yandex{ curl_setopt($curlproc, CURLOPT_SSL_VERIFYPEER, true); curl_setopt($curlproc, CURLOPT_CONNECTTIMEOUT, 30); curl_setopt($curlproc, CURLOPT_TIMEOUT, 30); + + $this->backend->assign_proxy($curlproc, $proxy); $data = curl_exec($curlproc); @@ -207,6 +209,8 @@ class yandex{ public function web($get){ + $this->backend = new backend("yandex_w"); + // has captcha // https://yandex.com/search/touch/?text=lol&app_platform=android&appsearch_header=1&ui=webmobileapp.yandex&app_version=23070603&app_id=ru.yandex.searchplugin&search_source=yandexcom_touch_native&clid=2218567 @@ -215,10 +219,11 @@ class yandex{ if($get["npt"]){ - $npt = $this->nextpage->get($get["npt"], "web"); + [$npt, $proxy] = $this->backend->get($get["npt"], "web"); $html = $this->get( + $proxy, "https://yandex.com" . $npt, [], "yes" @@ -226,6 +231,12 @@ class yandex{ }else{ $search = $get["s"]; + if(strlen($search) === 0){ + + throw new Exception("Search term is empty!"); + } + + $proxy = $this->backend->get_ip(); $lang = $get["lang"]; $older = $get["older"]; $newer = $get["newer"]; @@ -269,6 +280,7 @@ class yandex{ try{ $html = $this->get( + $proxy, "https://yandex.com/search/site/", $params, "yes" @@ -313,7 +325,7 @@ class yandex{ if(count($npt) !== 0){ $out["npt"] = - $this->nextpage->store( + $this->backend->store( $this->fuckhtml ->getTextContent( $npt @@ -321,7 +333,8 @@ class yandex{ ["attributes"] ["href"] ), - "web" + "web", + $proxy ); } @@ -386,17 +399,18 @@ class yandex{ public function image($get){ + $this->backend = new backend("yandex_i"); + if($get["npt"]){ - $request = - json_decode( - $this->nextpage->get( - $get["npt"], - "images" - ), - true + [$request, $proxy] = + $this->backend->get( + $get["npt"], + "images" ); + $request = json_decode($request, true); + $nsfw = $request["nsfw"]; unset($request["nsfw"]); }else{ @@ -407,6 +421,7 @@ class yandex{ throw new Exception("Search term is empty!"); } + $proxy = $this->backend->get_ip(); $nsfw = $get["nsfw"]; $time = $get["time"]; $size = $get["size"]; @@ -611,9 +626,11 @@ class yandex{ try{ $json = $this->get( + $proxy, "https://yandex.com/images/search", $request, - $nsfw + $nsfw, + "yandex_i" ); }catch(Exception $err){ @@ -676,7 +693,12 @@ class yandex{ $request["p"] = 1; } - $out["npt"] = $this->nextpage->store(json_encode($request), "images"); + $out["npt"] = + $this->backend->store( + json_encode($request), + "images", + $proxy + ); } // get search results @@ -744,21 +766,29 @@ class yandex{ public function video($get){ + $this->backend = new backend("yandex_v"); + if($get["npt"]){ - $params = - json_decode( - $this->nextpage->get( - $get["npt"], - "web" - ), - true + [$params, $proxy] = + $this->backend->get( + $get["npt"], + "video" ); + $params = json_decode($params, true); + $nsfw = $params["nsfw"]; unset($params["nsfw"]); }else{ + $search = $get["s"]; + if(strlen($search) === 0){ + + throw new Exception("Search term is empty!"); + } + + $proxy = $this->backend->get_ip(); $nsfw = $get["nsfw"]; $time = $get["time"]; $duration = $get["duration"]; @@ -865,9 +895,11 @@ class yandex{ try{ $json = $this->get( + $proxy, "https://yandex.com/video/search", $params, - $nsfw + $nsfw, + "yandex_v" ); }catch(Exception $error){ @@ -926,9 +958,10 @@ class yandex{ $params["p"] = "1"; $params["nsfw"] = $nsfw; $out["npt"] = - $this->nextpage->store( + $this->backend->store( json_encode($params), - "web" + "video", + $proxy ); } diff --git a/scraper/yep.php b/scraper/yep.php index 8ff4a57..7a73635 100644 --- a/scraper/yep.php +++ b/scraper/yep.php @@ -4,8 +4,8 @@ class yep{ public function __construct(){ - include "lib/nextpage.php"; - $this->nextpage = new nextpage("yep"); + include "lib/backend.php"; + $this->backend = new backend("yep"); } public function getfilters($page){ @@ -238,7 +238,7 @@ class yep{ ]; } - private function get($url, $get = []){ + private function get($proxy, $url, $get = []){ $curlproc = curl_init(); @@ -251,7 +251,7 @@ class yep{ curl_setopt($curlproc, CURLOPT_ENCODING, ""); // default encoding curl_setopt($curlproc, CURLOPT_HTTPHEADER, - ["User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:107.0) Gecko/20100101 Firefox/110.0", + ["User-Agent: " . config::USER_AGENT, "Accept: */*", "Accept-Language: en-US,en;q=0.5", "Accept-Encoding: gzip", @@ -269,6 +269,8 @@ class yep{ curl_setopt($curlproc, CURLOPT_SSL_VERIFYPEER, true); curl_setopt($curlproc, CURLOPT_CONNECTTIMEOUT, 30); curl_setopt($curlproc, CURLOPT_TIMEOUT, 30); + + $this->backend->assign_proxy($curlproc, $proxy); $data = curl_exec($curlproc); @@ -284,6 +286,11 @@ class yep{ public function image($get){ $search = $get["s"]; + if(strlen($search) === 0){ + + throw new Exception("Search term is empty!"); + } + $country = $get["country"]; $nsfw = $get["nsfw"]; @@ -305,6 +312,7 @@ class yep{ $json = json_decode( $this->get( + $this->backend->get_ip(), // no nextpage! "https://api.yep.com/fs/2/search", [ "client" => "web", diff --git a/scraper/youtube.php b/scraper/youtube.php index 83a68ba..526b026 100644 --- a/scraper/youtube.php +++ b/scraper/youtube.php @@ -8,8 +8,8 @@ class youtube{ public function __construct(){ - include "lib/nextpage.php"; - $this->nextpage = new nextpage("yt"); + include "lib/backend.php"; + $this->backend = new backend("yt"); } public function getfilters($page){ @@ -340,7 +340,7 @@ class youtube{ const req_web = 0; const req_xhr = 1; - private function get($url, $get = [], $reqtype = self::req_web, $continuation = null){ + private function get($proxy, $url, $get = [], $reqtype = self::req_web, $continuation = null){ $curlproc = curl_init(); @@ -354,7 +354,7 @@ class youtube{ switch($reqtype){ case self::req_web: $headers = - ["User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:107.0) Gecko/20100101 Firefox/110.0", + ["User-Agent: " . config::USER_AGENT, "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8", "Accept-Language: en-US,en;q=0.5", "Accept-Encoding: gzip", @@ -370,7 +370,7 @@ class youtube{ case self::req_xhr: $headers = - ["User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:110.0) Gecko/20100101 Firefox/110.0", + ["User-Agent: " . config::USER_AGENT, "Accept: */*", "Accept-Language: en-US,en;q=0.5", "Accept-Encoding: gzip", @@ -397,6 +397,8 @@ class youtube{ curl_setopt($curlproc, CURLOPT_SSL_VERIFYPEER, true); curl_setopt($curlproc, CURLOPT_CONNECTTIMEOUT, 30); curl_setopt($curlproc, CURLOPT_TIMEOUT, 30); + + $this->backend->assign_proxy($curlproc, $proxy); $data = curl_exec($curlproc); @@ -430,17 +432,17 @@ class youtube{ $json = fread($handle, filesize("nextpage.json")); fclose($handle);*/ - $npt = - json_decode( - $this->nextpage->get( - $get["npt"], - "videos" - ), - true + [$npt, $proxy] = + $this->backend->get( + $get["npt"], + "videos" ); + $npt = json_decode($npt, true); + try{ $json = $this->get( + $proxy, "https://www.youtube.com/youtubei/v1/search", [ "key" => $npt["key"], @@ -507,6 +509,7 @@ class youtube{ throw new Exception("Search term is empty!"); } + $proxy = $this->backend->get_ip(); $date = $get["date"]; $type = $get["type"]; $duration = $get["duration"]; @@ -537,6 +540,7 @@ class youtube{ try{ $json = $this->get( + $proxy, "https://www.youtube.com/results", $get ); @@ -942,7 +946,14 @@ class youtube{ if($this->out["npt"] !== null){ - $this->out["npt"] = $this->nextpage->store(json_encode($this->out["npt"]), "videos"); + $this->out["npt"] = + $this->backend->store( + json_encode( + $this->out["npt"] + ), + "videos", + $proxy + ); } return $this->out; diff --git a/settings.php b/settings.php index 41322d6..bee31ea 100644 --- a/settings.php +++ b/settings.php @@ -1,5 +1,7 @@ <?php +include "data/config.php"; + /* Define settings */ @@ -28,16 +30,7 @@ $settings = [ [ "description" => "Theme", "parameter" => "theme", - "options" => [ - [ - "value" => "dark", - "text" => "Gruvbox dark" - ], - [ - "value" => "cream", - "text" => "Gruvbox cream" - ] - ] + "options" => [] ], [ "description" => "Prevent clicking background elements when image viewer is open", @@ -59,7 +52,7 @@ $settings = [ "name" => "Scrapers to use", "settings" => [ [ - "description" => "Autocomplete<br><i>Picking <div class=\"code-inline\">Auto</div> changes the source dynamically depending of the page's scraper<br>Picking <div class=\"code-inline\">Disabled</div> disables this feature</i>", + "description" => "Autocomplete<br><i>Picking <span class=\"code-inline\">Auto</span> changes the source dynamically depending of the page's scraper<br><b>Warning:</b> If you edit this field, you will need to re-add the search engine so that the new autocomplete settings are applied!</i>", "parameter" => "scraper_ac", "options" => [ [ @@ -243,6 +236,26 @@ $settings = [ ]; /* + Set theme collection +*/ +$themes = glob("static/themes/*"); + +$settings[0]["settings"][1]["options"][] = [ + "value" => "Dark", + "text" => "Dark" +]; + +foreach($themes as $theme){ + + $theme = explode(".", basename($theme))[0]; + + $settings[0]["settings"][1]["options"][] = [ + "value" => $theme, + "text" => $theme + ]; +} + +/* Set cookies */ @@ -262,28 +275,48 @@ if($_POST){ foreach($loop as $key => $value){ - foreach($settings as $title){ + if($key == "theme"){ - foreach($title["settings"] as $list){ + if($value == config::DEFAULT_THEME){ - if( - $list["parameter"] == $key && - $list["options"][0]["value"] == $value - ){ - - unset($_COOKIE[$key]); - - setcookie( - $key, - "", - [ - "expires" => -1, // removes cookie - "samesite" => "Lax", - "path" => "/" - ] - ); + unset($_COOKIE[$key]); + + setcookie( + "theme", + "", + [ + "expires" => -1, // removes cookie + "samesite" => "Lax", + "path" => "/" + ] + ); + continue; + } + }else{ + + foreach($settings as $title){ + + foreach($title["settings"] as $list){ - continue 3; + if( + $list["parameter"] == $key && + $list["options"][0]["value"] == $value + ){ + + unset($_COOKIE[$key]); + + setcookie( + $key, + "", + [ + "expires" => -1, // removes cookie + "samesite" => "Lax", + "path" => "/" + ] + ); + + continue 3; + } } } } @@ -313,19 +346,13 @@ include "lib/frontend.php"; $frontend = new frontend(); echo - '<!DOCTYPE html>' . - '<html lang="en">' . - '<head>' . - '<meta http-equiv="Content-Type" content="text/html;charset=utf-8">' . - '<title>Settings</title>' . - '<link rel="stylesheet" href="/static/style.css?v4">' . - '<meta name="viewport" content="width=device-width,initial-scale=1">' . - '<meta name="robots" content="index,follow">' . - '<link rel="icon" type="image/x-icon" href="/favicon.ico">' . - '<meta name="description" content="4get.ca: Settings">' . - '<link rel="search" type="application/opensearchdescription+xml" title="4get" href="/opensearch.xml">' . - '</head>' . - '<body' . $frontend->getthemeclass() . '>'; + $frontend->load( + "header_nofilters.html", + [ + "title" => "Settings", + "class" => "" + ] + ); $left = '<h1>Settings</h1>' . @@ -376,6 +403,14 @@ foreach($settings as $title){ '<div class="title">' . $setting["description"] . '</div>' . '<select name="' . $setting["parameter"] . '">'; + if($setting["parameter"] == "theme"){ + + if(!isset($_COOKIE["theme"])){ + + $_COOKIE["theme"] = config::DEFAULT_THEME; + } + } + foreach($setting["options"] as $option){ $left .= diff --git a/sitemap.php b/sitemap.php new file mode 100644 index 0000000..6f6c095 --- /dev/null +++ b/sitemap.php @@ -0,0 +1,35 @@ +<?php + +header("Content-Type: application/xml"); +include "data/config.php"; + +$domain = + htmlspecialchars( + (strpos(strtolower($_SERVER['SERVER_PROTOCOL']), 'https') === false ? 'http' : 'https') . + '://' . $_SERVER["HTTP_HOST"] + ); + +echo + '<?xml version="1.0" encoding="UTF-8"?>' . + '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">' . + '<url>' . + '<loc>' . $domain . '/</loc>' . + '<lastmod>2023-07-31T07:56:12+03:00</lastmod>' . + '</url>' . + '<url>' . + '<loc>' . $domain . '/about</loc>' . + '<lastmod>2023-07-31T07:56:12+03:00</lastmod>' . + '</url>' . + '<url>' . + '<loc>' . $domain . '/instances</loc>' . + '<lastmod>2023-07-31T07:56:12+03:00</lastmod>' . + '</url>' . + '<url>' . + '<loc>' . $domain . '/settings</loc>' . + '<lastmod>2023-07-31T07:56:12+03:00</lastmod>' . + '</url>' . + '<url>' . + '<loc>' . $domain . '/api.txt</loc>' . + '<lastmod>2023-07-31T07:56:12+03:00</lastmod>' . + '</url>' . + '</urlset>'; diff --git a/static/client.js b/static/client.js index 2e691f8..5935f92 100644 --- a/static/client.js +++ b/static/client.js @@ -318,11 +318,23 @@ if(image_class !== null){ image_url = htmlspecialchars_decode(image_url); } + var w = Math.round(click.target.naturalWidth); + var h = Math.round(click.target.naturalHeight); + + if( + w === 0 || + h === 0 + ){ + + w = 100; + h = 100; + } + collection = [ { "url": image_url, - "width": Math.round(click.target.naturalWidth), - "height": Math.round(click.target.naturalHeight) + "width": w, + "height": h } ]; @@ -362,10 +374,22 @@ if(image_class !== null){ var imagesize = elem.getElementsByTagName("img")[0]; + var imagesize_w = 0; + var imagesize_h = 0; + if(imagesize.complete){ - var imagesize_w = imagesize.naturalWidth; - var imagesize_h = imagesize.naturalHeight; + imagesize_w = imagesize.naturalWidth; + imagesize_h = imagesize.naturalHeight; + } + + if( + imagesize_w === 0 || + imagesize_h === 0 + ){ + + imagesize_w = 100; + imagesize_h = 100; } for(var i=0; i<collection.length; i++){ diff --git a/static/serverping.js b/static/serverping.js new file mode 100644 index 0000000..5fe285d --- /dev/null +++ b/static/serverping.js @@ -0,0 +1,495 @@ + +function htmlspecialchars(str){ + + if(str === null){ + + return "<i><Empty></i>"; + } + + var map = { + '&': '&', + '<': '<', + '>': '>', + '"': '"', + "'": ''' + } + + return str.replace(/[&<>"']/g, function(m){return map[m];}); +} + +// initialize garbage +var list = []; +var pinged_list = []; +var reqs = 0; +var errors = 0; +var sort = 0; // lower ping first + +// check for instance redirect stuff +var redir = ""; +var target = "/web?"; +new URL(window.location.href) + .searchParams + .forEach( + function(value, key){ + + if(key == "target"){ + + target = "/" + encodeURIComponent(value) + "?"; + return; + } + + if(key == "npt"){ return; } + redir += encodeURIComponent(key) + "=" + encodeURIComponent(value) + } + ); + +if(redir != ""){ + redir = target + redir; +} + +var quote = document.createElement("div"); +quote.className = "quote"; +quote.innerHTML = 'Pinged <b>0</b> servers (<b>0</b> failed requests)'; +var [div_servercount, div_failedreqs] = + quote.getElementsByTagName("b"); + +var noscript = document.getElementsByTagName("noscript")[0]; +document.body.insertBefore(quote, noscript.nextSibling); + +// create table +var table = document.createElement("table"); +table.innerHTML = + '<thead>' + + '<tr>' + + '<th><div class="arrow up"></div>Ping</th>' + + '<th class="extend">Server</th>' + + '<th>Address</th>' + + '<th>Bot protection</th>' + + '<th title="Amount of legit requests processed since the last APCU cache clear (usually happens at midnight)">Real reqs (?)</th>' + + '<th title="Amount of filtered requests processed since the last APCU cache clear (usually happens at midnight)">Bot reqs (?)</th>' + + '<th>API</th>' + + '<th>Version</th>' + + '</tr>' + + '</thead>' + + '<tbody></tbody>'; + +document.body.insertBefore(table, quote.nextSibling); + +// handle sorting clicks +var tbody = table.getElementsByTagName("tbody")[0]; +var th = table.getElementsByTagName("th"); + +for(var i=0; i<th.length; i++){ + + th[i].addEventListener("click", function(event){ + + if(event.target.className.includes("arrow")){ + + var div = event.target.parentElement; + }else{ + + var div = event.target; + } + + var arrow = div.getElementsByClassName("arrow"); + var orientation = 0; // up + + if(arrow.length === 0){ + + // delete arrow and add new one + arrow = document.getElementsByClassName("arrow"); + arrow[0].remove(); + + arrow = document.createElement("div"); + arrow.className = "arrow up"; + div.insertBefore(arrow, event.target.firstChild); + }else{ + + // switch arrow position + if(arrow[0].className == "arrow down"){ + + arrow[0].className = "arrow up"; + }else{ + + arrow[0].className = "arrow down"; + orientation = 1; + } + } + + switch(div.textContent.toLowerCase()){ + + case "ping": sort = orientation; break; + case "server": sort = 2 + orientation; break; + case "address": sort = 4 + orientation; break; + case "bot protection": sort = 6 + orientation; break; + case "real reqs (?)": sort = 8 + orientation; break; + case "bot reqs (?)": sort = 10 + orientation; break; + case "api": sort = 12 + orientation; break; + case "version": sort = 14 + orientation; break; + } + + render_list(); + }); +} + +function validate_url(url, allow_http = false){ + + try{ + + url = new URL(url); + if( + url.protocol == "https:" || + ( + ( + allow_http === true || + window.location.protocol == "http:" + ) && + url.protocol == "http:" + ) + ){ + + return true; + } + }catch(error){} // do nothing + + return false; +} + +function number_format(int){ + + return new Intl.NumberFormat().format(int); +} + +// parse initial server list +fetch_server(window.location.origin); + +async function fetch_server(server){ + + if(!validate_url(server)){ + console.warn("Invalid server URL: " + server); + return; + } + + // make sure baseURL is origin + server = new URL(server).origin; + // prevent multiple fetches + for(var i=0; i<list.length; i++){ + + if(list[i] == server){ + + // serber was already fetched + console.info("Already checked server: " + server); + return; + } + } + + // prevent future fetches + list.push(server); + + var data = null; + var ping = new Date().getTime(); + + try{ + + data = await fetch( + server + "/ami4get" + ); + + if(data.status !== 200){ + + // endpoint is not available + errors++; + div_failedreqs.textContent = number_format(errors); + console.warn(server + ": Invalid HTTP code " + data.status); + return; + } + + data = await data.json(); + data.server.ping = new Date().getTime() - ping; + + }catch(error){ + + errors++; + div_failedreqs.textContent = number_format(errors); + console.warn(server + ": Could not fetch or decode JSON"); + return; + } + + // sanitize data + if( + typeof data.status != "string" || + data.status != "ok" || + typeof data.server != "object" || + !( + typeof data.server.name == "string" || + ( + typeof data.server.name == "object" && + data.server.name === null + ) + ) || + typeof data.service != "string" || + data.service != "4get" || + ( + typeof data.server.description != "string" && + data.server.description !== null + ) || + typeof data.server.bot_protection != "number" || + typeof data.server.real_requests != "number" || + typeof data.server.bot_requests != "number" || + typeof data.server.api_enabled != "boolean" || + typeof data.server.alt_addresses != "object" || + typeof data.server.version != "number" || + typeof data.instances != "object" + ){ + + errors++; + div_failedreqs.textContent = number_format(errors); + console.warn(server + ": Malformed JSON"); + return; + } + + data.server.ip = server; + + reqs++; + div_servercount.textContent = number_format(reqs); + + var total = pinged_list.push(data) - 1; + pinged_list[total].index = total; + + render_list(); + + // get more serbers + for(var i=0; i<data.instances.length; i++){ + + fetch_server(data.instances[i]); + } +} + +function sorta(object, element, order){ + + return object.slice().sort( + function(a, b){ + + if(order){ + + return a.server[element] - b.server[element]; + } + + return b.server[element] - a.server[element]; + } + ); +} + +function textsort(object, element, order){ + + var sort = object.slice().sort( + function(a, b){ + + return a.server[element].localeCompare(b.server[element]); + } + ); + + if(!order){ + return sort.reverse(); + } + + return sort; +} + +function render_list(){ + + var sorted_list = []; + + // sort + var filter = Boolean(sort % 2); + + switch(sort){ + + case 0: + case 1: + sorted_list = sorta(pinged_list, "ping", filter === true ? false : true); + break; + + case 2: + case 3: + sorted_list = textsort(pinged_list, "name", filter === true ? false : true); + break; + + case 4: + case 5: + sorted_list = textsort(pinged_list, "ip", filter === true ? false : true); + break; + + case 6: + case 7: + sorted_list = sorta(pinged_list, "bot_protection", filter === true ? false : true); + break; + + case 8: + case 9: + sorted_list = sorta(pinged_list, "real_requests", filter); + break; + + case 10: + case 11: + sorted_list = sorta(pinged_list, "bot_requests", filter); + break; + + case 12: + case 13: + sorted_list = sorta(pinged_list, "api_enabled", filter); + break; + + case 14: + case 15: + sorted_list = sorta(pinged_list, "version", filter); + break; + } + + // render tabloid + var html = ""; + + for(var k=0; k<sorted_list.length; k++){ + + html += '<tr onclick="show_server(' + sorted_list[k].index + ');">'; + + for(var i=0; i<8; i++){ + + html += '<td'; + + switch(i){ + + case 0: // server ping + if(sorted_list[k].server.ping <= 100){ + + html += '><span style="color:var(--green);">' + sorted_list[k].server.ping + '</span>'; + break; + } + + if(sorted_list[k].server.ping <= 200){ + + html += '><span style="color:var(--yellow);">' + sorted_list[k].server.ping + '</span>'; + break; + } + + html += '><span style="color:var(--red);">' + number_format(sorted_list[k].server.ping) + '</span>'; + break; + + // server name + case 1: html += ' class="extend">' + htmlspecialchars(sorted_list[k].server.name); break; + case 2: html += '>' + htmlspecialchars(new URL(sorted_list[k].server.ip).host); break; + case 3: // bot protection + switch(sorted_list[k].server.bot_protection){ + + case 0: + html += '><span style="color:var(--green);">Disabled</span>'; + break; + + case 1: + html += '><span style="color:var(--yellow);">Image captcha</span>'; + break; + + case 2: + html += '><span style="color:var(--red);">Invite only</span>'; + break; + + default: + html += '>Unknown'; + } + break; + + case 4: // real reqs + html += '>' + number_format(sorted_list[k].server.real_requests); + break; + + case 5: // bot reqs + html += '>' + number_format(sorted_list[k].server.bot_requests); + break; + + case 6: // api enabled + + if(sorted_list[k].server.api_enabled){ + + html += '><span style="color:var(--green);">Yes</span>'; + }else{ + + html += '><span style="color:var(--red);">No</span>'; + } + break; + + // version + case 7: html += ">v" + sorted_list[k].server.version; break; + } + + html += '</td>'; + } + + html += '</tr>'; + } + + tbody.innerHTML = html; +} + +var popup_bg = document.getElementById("popup-bg"); +var popup_wrapper = document.getElementsByClassName("popup-wrapper")[0]; +var popup = popup_wrapper.getElementsByClassName("popup")[0]; +var popup_shown = false; + +popup_bg.addEventListener("click", function(){ + + popup_wrapper.style.display = "none"; + popup_bg.style.display = "none"; +}); + +function show_server(serverid){ + + var html = + '<h2>' + htmlspecialchars(pinged_list[serverid].server.name) + '</h2>' + + 'Description' + + '<div class="code">' + htmlspecialchars(pinged_list[serverid].server.description) + '</div>'; + + var url_obj = new URL(pinged_list[serverid].server.ip); + var url = htmlspecialchars(url_obj.origin); + var domain = url_obj.hostname; + + html += + 'URL: <a rel="noreferer" target="_BLANK" href="' + url + redir + '">' + url + '</a> <a rel="noreferer" target="_BLANK" href="https://browserleaks.com/ip/' + encodeURIComponent(domain) + '">(IP lookup)</a>' + + '<br><br>Alt addresses:'; + + var len = pinged_list[serverid].server.alt_addresses.length; + + if(len === 0){ + + html += ' <i><Empty></i>'; + }else{ + + html += '<ul>'; + + for(var i=0; i<len; i++){ + + var url_obj = new URL(pinged_list[serverid].server.alt_addresses[i]); + var url = htmlspecialchars(url_obj.origin); + var domain = url_obj.hostname; + + if(validate_url(pinged_list[serverid].server.alt_addresses[i], true)){ + + html += '<li><a rel="noreferer" href="' + url + redir + '" target="_BLANK">' + url + '</a> <a rel="noreferer" target="_BLANK" href="https://browserleaks.com/ip/' + encodeURIComponent(domain) + '">(IP lookup)</a></li>'; + }else{ + + console.warn(pinged_list[serverid].server.ip + ": Invalid peer URL => " + pinged_list[serverid].server.alt_addresses[i]); + } + } + + html += '</ul>'; + } + popup.innerHTML = html; + + popup_wrapper.style.display = "block"; + popup_bg.style.display = "block"; +} + +function hide_server(){ + + popup_wrapper.style.display = "none"; + popup_bg.style.display = "none"; +} diff --git a/static/style.css b/static/style.css index fdb4951..bb76c2e 100644 --- a/static/style.css +++ b/static/style.css @@ -1,7 +1,3 @@ - -/* - Global styles -*/ :root{ /* background */ --1d2021: #1d2021; @@ -21,31 +17,11 @@ --default: #d4be98; --keyword: #d8a657; --string: #7daea7; -} - -.theme-white{ - /* background */ - --1d2021: #bdae93; - --282828: #a89984; - --3c3836: #a89984; - --504945: #504945; - /* font */ - --928374: #1d2021; - --a89984: #282828; - --bdae93: #3c3836; - --8ec07c: #52520e; - --ebdbb2: #1d2021; - - /* code highlighter */ - --comment: #6a4400; - --default: #d4be98; - --keyword: #4a4706; - --string: #076678; -} - -.theme-white .autocomplete .entry:hover{ - background:#928374; + /* color codes for instance list */ + --green: #b8bb26; + --yellow: #d8a657; + --red: #fb4934; } audio{ @@ -516,6 +492,7 @@ h3,h4,h5,h6{ .web .favicon img, .favicon-dropdown img{ margin:3px 7px 0 0; + width:16px; height:16px; font-size:12px; line-height:16px; @@ -1020,6 +997,7 @@ table tr a:last-child{ cursor:grab; user-select:none; pointer-events:none; + z-index:5; } #popup:active{ @@ -1046,6 +1024,7 @@ table tr a:last-child{ height:35px; background:var(--1d2021); border-bottom:1px solid var(--928374); + z-index:4; } #popup-bg{ @@ -1057,6 +1036,7 @@ table tr a:last-child{ width:100%; height:100%; display:none; + z-index:3; } #popup-status select{ @@ -1167,6 +1147,108 @@ table tr a:last-child{ } /* + Instances page +*/ +.instances table{ + white-space:nowrap; + margin-top:17px; +} + +.instances a{ + color:var(--bdae93); +} + +.instances tbody tr:nth-child(even){ + background:var(--282828); +} + +.instances thead{ + outline:1px solid var(--928374); + outline-offset:-1px; + background:var(--3c3836); + user-select:none; + z-index:2; + position:sticky; + top:0; +} + +.instances th{ + cursor:row-resize; +} + +.instances th:hover{ + background:var(--504945); +} + +.instances tbody{ + outline:1px solid var(--504945); + outline-offset:-1px; + position:relative; + top:-1px; +} + +.instances tbody tr:hover{ + background:var(--3c3836); + cursor:pointer; +} + +.instances .arrow{ + display:inline-block; + position:relative; + top:6px; + margin-right:7px; + width:0; + height:0; + border:6px solid transparent; + border-top:10px solid var(--bdae93); +} + +.instances .arrow.up{ + top:0; + border:6px solid transparent; + border-bottom:10px solid var(--bdae93); +} + +.instances th, .instances td{ + padding:4px 7px; + width:0; +} + +.instances .extend{ + width:unset; + overflow:hidden; + max-width:200px; +} + +.instances .popup-wrapper{ + display:none; + position:fixed; + left:50%; + top:50%; + transform:translate(-50%, -50%); + width:800px; + max-width:100%; + max-height:100%; + overflow-x:auto; + padding:17px; + box-sizing:border-box; + pointer-events:none; + z-index:3; +} + +.instances .popup{ + border:1px solid var(--928374); + background:var(--282828); + padding:7px 10px; + pointer-events:initial; +} + +.instances ul{ + padding-left:20px; +} + + +/* Responsive image */ @media only screen and (max-width: 1454px){ #images .image-wrapper{ width:25%; } } @@ -1221,7 +1303,7 @@ table tr a:last-child{ width:100%; } - table td{ + body:not(.instances) table td{ display:block; width:100%; } diff --git a/static/themes/Cream.css b/static/themes/Cream.css new file mode 100644 index 0000000..3d6b615 --- /dev/null +++ b/static/themes/Cream.css @@ -0,0 +1,31 @@ +:root{ + /* background */ + --1d2021: #bdae93; + --282828: #a89984; + --3c3836: #a89984; + --504945: #504945; + + /* font */ + --928374: #1d2021; + --a89984: #282828; + --bdae93: #3c3836; + --8ec07c: #52520e; + --ebdbb2: #1d2021; + + /* code highlighter */ + --comment: #6a4400; + --default: #d4be98; + --keyword: #4a4706; + --string: #076678; + + /* color codes for instance list */ + --green: #636311; + --yellow: #8a6214; + --red: #711410; +} + +.autocomplete .entry:hover, +.instances th:hover +{ + background:#928374; +} diff --git a/template/about.html b/template/about.html new file mode 100644 index 0000000..12dd957 --- /dev/null +++ b/template/about.html @@ -0,0 +1,77 @@ +<a href="/" class="link">< Go back</a> + +<h1>Set as default search engine</h1> +<a href="#firefox"><h2 id="firefox">On Firefox and other Gecko based browsers</h2></a> +To set this as your default search engine on Firefox, right click the URL bar and select <div class="code-inline">Add "4get"</div>. Then, visit <a href="about:preferences#search" target="_BLANK" class="link">about:preferences#search</a> and select <div class="code-inline">4get</div> in the dropdown menu. + +<a href="#chrome"><h2 id="chrome">On Chromium and Blink based browsers</h2></a> +Click the 3 superpositioned dots at the top right of the screen and click on <div class="code-inline">Settings</div>, then search for <div class="code-inline">default search engine</div>, or visit <a href="chrome://settings/searchEngines">chrome://settings/searchEngines</a>.<br><br> + +Once you're there, click the pencil on the last entry under "Search engines" (it's probably DuckDuckGo). Once you do that, a popup will appear. Populate it with the following information: + +<table> + <tr> + <td><b>Field</b></td> + <td><b>Value</b></td> + </tr> + <tr> + <td>Search engine</td> + <td>{%server_name%}</td> + </tr> + <tr> + <td>Shortcut</td> + <td>{%server_name%}</td> + </tr> + <tr> + <td>URL with %s in place of query</td> + <td>https://4get.ca/web?s=%s</td> + </tr> +</table> + +Once that's done, click <div class="code-inline">Save</div>. Then, on the right handside of the newly created entry, open the dropdown menu and select <div class="code-inline">Make default</div>. + +<h1>Frequently asked questions</h1> +<a href="#what-is-this"><h2 id="what-is-this">What is this?</h2></a> +This is a metasearch engine that gets results from other engines, and strips away all of the tracking parameters and Microsoft/globohomo bullshit they add. Most of the other alternatives to Google jack themselves off about being ""privacy respecting"" or whatever the fuck but it always turns out to be a total lie, and I just got fed up with their shit honestly. Alternatives like Searx or YaCy all fucking sucks so I made my own thing. + +<a href="#goal"><h2 id="goal">My goal</h2></a> +Provide users with a privacy oriented, extremely lightweight, ad free, free as in freedom (and free beer!) way to search for documents around the internet, with minimal, optional javascript code. My long term goal would be to build my own index (that doesn't suck) and provide users with an unbiased search engine, with no political inclinations. + +<a href="#logs"><h2 id="logs">Do you keep logs?</h2></a> +I store data temporarly to get the next page of results. This might include search queries, tokens and other parameters. These parameters are encrypted using <div class="code-inline">aes-256-gcm</div> on the serber, for which I give you a key (also known internally as <div class="code-inline">npt</div> token). When you make a request to get the next page, you supply the token, the data is decrypted and the request is fulfilled. This encrypted data is deleted after 15 minutes, or after it's used, whichever comes first.<br><br> + +I <b>don't</b> log IP addresses, user agents, or anything else. The <div class="code-inline">npt</div> tokens are the only thing that are stored (in RAM, mind you), temporarly, encrypted. + +<a href="#information-sharing"><h2 id="information-sharing">Do you share information with third parties?</h2></a> +Your search queries and supplied filters are shared with the scraper you chose (so I can get the search results, duh). I don't share anything else (that means I don't share your IP address, location, or anything of this kind). There is no way that site can know you're the one searching for something, <u>unless you send out a search query that de-anonymises you.</u> For example, a search query like "hello my full legal name is jonathan gallindo and i want pictures of cloacas" would definitively blow your cover. 4get doesn't contain ads or any third party javascript applets or trackers. I don't profile you, and quite frankly, I don't give a shit about what you search on there.<br><br> + +TL;DR assume those websites can see what you search for, but can't see who you are (unless you're really dumb). + +<a href="#hosting"><h2 id="hosting">Where is this website hosted?</h2></a> +This website is hosted on a Contabo shitbox in the United States. + +<a href="#keyboard-shortcuts"><h2 id="keyboard-shortcuts">Keyboard shortcuts?</h2></a> +Use <div class="code-inline">/</div> to focus the search box.<br><br> + +When the image viewer is open, you can use the following keybinds:<br> +<div class="code-inline">Up</div>, <div class="code-inline">Down</div>, <div class="code-inline">Left</div>, <div class="code-inline">Right</div> to rotate the image.<br> +<div class="code-inline">CTRL+Up</div>, <div class="code-inline">CTRL+Down</div>, <div class="code-inline">CTRL+Left</div>, <div class="code-inline">CTRL+Right</div> to mirror the image.<br> +<div class="code-inline">Escape</div> to exit the image viewer. + +<a href="#schizo"><h2 id="schizo">How can I trust you?</h2></a> +You just sort of have to take my word for it right now. If you'd rather trust yourself instead of me (I believe in you!!), all of the code on this website is available trough my <a href="https://git.lolcat.ca/lolcat" class="link">git page</a> for you to host on your own machines. Just a reminder: if you're the sole user of your instance, it doesn't take immense brain power for Microshit to figure out you basically just switched IP addresses. Invite your friends to use your instance! + +<a href="#donate"><h2 id="donate">Support the project</h2></a> +Donate to me trough ko-fi: <a href="https://ko-fi.com/lolcat" target="BLANK" rel="noreferrer">ko-fi.com/lolcat</a><br> +Please donate I sent myself a donation for testing if it works and it looks fucking dumb. Reasons to donate are listed on there. Thank you! + +<a href="#contact"><h2 id="contact">I want to report abuse or have erotic roleplay trough email</h2></a> +I don't know about that second part but if you want to talk to me, just drop me an email...<br><br> + +<b>Message to all DMCA enforcers:</b> I don't host any of the content. Everything you see here is <u>proxied</u> trough my shitbox with no moderation. Please reach out to the people hosting the infringing content instead.<br><br> + +<a href="https://lolcat.ca" rel="dofollow" class="link">Click here to contact me!</a><br><br> + +<a href="https://validator.w3.org/nu/?doc=https%3A%2F%2F4get.ca" title="W3 Valid!"> + <img src="/static/icon/w3html.png" alt="Valid W3C HTML 4.01" width="88" height="31"> +</a> diff --git a/template/header.html b/template/header.html index 9e519fc..fcdbb13 100644 --- a/template/header.html +++ b/template/header.html @@ -3,14 +3,15 @@ <head> <meta http-equiv="Content-Type" content="text/html;charset=utf-8"> <title>{%title%}</title> - <link rel="stylesheet" href="/static/style.css?v4"> + <link title="{%server_name%}" href="/opensearch{%ac%}" rel="search" type="application/opensearchdescription+xml"> + <link rel="stylesheet" href="/static/style.css?v{%version%}"> + {%style%} <meta name="viewport" content="width=device-width,initial-scale=1"> <meta name="robots" content="{%index%}index,{%index%}follow"> <link rel="icon" type="image/x-icon" href="/favicon.ico"> - <meta name="description" content="4get.ca: {%description%}"> - <link rel="search" type="application/opensearchdescription+xml" title="4get" href="/opensearch.xml"> + <meta name="description" content="{%server_name%}: {%description%}"> </head> - <body{%body_class%}> + <body> <form method="GET" autocomplete="off"> <div class="searchbox"> <input type="submit" value="Search" tabindex="-1"> diff --git a/template/header_nofilters.html b/template/header_nofilters.html new file mode 100644 index 0000000..116eef6 --- /dev/null +++ b/template/header_nofilters.html @@ -0,0 +1,14 @@ +<!DOCTYPE html> +<html lang="en"> +<head> + <meta http-equiv="Content-Type" content="text/html;charset=utf-8"> + <title>{%title%}</title> + <link title="{%server_name%}" href="/opensearch{%ac%}" rel="search" type="application/opensearchdescription+xml"> + <link rel="stylesheet" href="/static/style.css?v{%version%}"> + {%style%} + <meta name="viewport" content="width=device-width,initial-scale=1"> + <meta name="robots" content="index,follow"> + <link rel="icon" type="image/x-icon" href="/favicon.ico"> + <meta name="description" content="{%server_name%}: {%title%}"> +</head> +<body{%class%}> diff --git a/template/home.html b/template/home.html index 9818677..b4f0735 100644 --- a/template/home.html +++ b/template/home.html @@ -2,15 +2,17 @@ <html lang="en"> <head> <meta http-equiv="Content-Type" content="text/html;charset=utf-8"> - <title>4get</title> + <title>{%server_name%}</title> + <link title="{%server_name%}" href="/opensearch{%ac%}" rel="search" type="application/opensearchdescription+xml"> + <link rel="sitemap" type="application/xml" title="Sitemap" href="/sitemap"> <meta name="viewport" content="width=device-width,initial-scale=1"> - <link rel="stylesheet" href="/static/style.css?v4"> + <link rel="stylesheet" href="/static/style.css?v{%version%}"> + {%style%} <meta name="robots" content="index,follow"> <link rel="icon" type="image/x-icon" href="/favicon.ico"> - <meta name="description" content="4get.ca: They live in our walls!"> - <link rel="search" type="application/opensearchdescription+xml" title="4get" href="/opensearch.xml"> + <meta name="description" content="{%server_name%}: {%server_short_description%}"> </head> - <body class="home {%body_class%}"> + <body class="home"> <div id="center"> <form method="GET" autocomplete="off" action="web"> <div class="logo"> @@ -26,13 +28,12 @@ <div class="autocomplete"></div> </div> </form> - <a href="settings">Settings</a> • <a href="api.txt">API</a> • <a href="about">About</a> • <a href="https://git.lolcat.ca/lolcat/4get">Source</a> • <a href="https://ko-fi.com/lolcat" rel="noreferrer" target="BLANK">Donate</a> + <a href="settings">Settings</a> • <a href="instances">Instances</a> • <a href="api.txt">API</a> • <a href="about">About</a> • <a href="https://git.lolcat.ca/lolcat/4get">Source</a> • <a href="https://ko-fi.com/lolcat" rel="noreferrer" target="BLANK">Donate</a> <div class="subtext"> - Clearnet: <a href="https://4get.ca">4get.ca</a><br> - Tor: <a href="http://4getwebfrq5zr4sxugk6htxvawqehxtdgjrbcn2oslllcol2vepa23yd.onion">4getwebfrq5zr4sxugk6htxvawqehxtdgjrbcn2oslllcol2vepa23yd.onion</a><br> - Report a problem: <a href="https://lolcat.ca">lolcat.ca</a> + <a href="https://4get.ca">Clearnet</a> • <a href="http://4getwebfrq5zr4sxugk6htxvawqehxtdgjrbcn2oslllcol2vepa23yd.onion">Tor</a> • <a href="https://lolcat.ca">Report a problem</a><br> + Running on <b>v{%version%}</b>!! </div> </div> - <script src="/static/client.js?v4"></script> + <script src="/static/client.js?v{%version%}"></script> </body> </html> diff --git a/template/images.html b/template/images.html index 1c5b23a..a19ddeb 100644 --- a/template/images.html +++ b/template/images.html @@ -2,6 +2,6 @@ {%images%} </div> {%nextpage%} - <script src="/static/client.js?v3"></script> + <script src="/static/client.js?v{%version%}"></script> </body> </html> diff --git a/template/instances.html b/template/instances.html new file mode 100644 index 0000000..829e638 --- /dev/null +++ b/template/instances.html @@ -0,0 +1,36 @@ +<!DOCTYPE html> +<html lang="en"> +<head> + <meta http-equiv="Content-Type" content="text/html;charset=utf-8"> + <title>Instance browser</title> + <link title="{%server_name%}" href="/opensearch{%ac%}" rel="search" type="application/opensearchdescription+xml"> + <link rel="stylesheet" href="/static/style.css?v{%version%}"> + {%style%} + <meta name="viewport" content="width=device-width,initial-scale=1"> + <meta name="robots" content="index,follow"> + <link rel="icon" type="image/x-icon" href="/favicon.ico"> + <meta name="description" content="{%server_name%}: Instances"> +</head> +<body class="instances"> + <h1>Instance browser</h1> + Learn how to setup your own instance here! <a href="https://git.lolcat.ca/lolcat/4get" target="_BLANK">https://git.lolcat.ca/lolcat/4get</a> + <noscript> + <div class="quote">For a better experience, whitelist javascript usage on this page.</div> + <table> + <thead> + <tr> + <th class="expand">Server</th> + </tr> + </thead> + <tbody> + {%instances_html%} + </tbody> + </table> + </noscript> + <div id="popup-bg"></div> + <div class="popup-wrapper"> + <div class="popup"></div> + </div> + <script src="static/serverping.js?v{%version%}"></script> +</body> +</html> diff --git a/template/search.html b/template/search.html index 35da30d..d7f73a5 100644 --- a/template/search.html +++ b/template/search.html @@ -11,6 +11,6 @@ {%left%} </div> </div> - <script src="/static/client.js?v4"></script> + <script src="/static/client.js?v{%version%}"></script> </body> </html> @@ -3,6 +3,8 @@ /* Initialize random shit */ +include "data/config.php"; + include "lib/frontend.php"; $frontend = new frontend(); @@ -28,20 +30,7 @@ try{ }catch(Exception $error){ - echo - $frontend->drawerror( - "Shit", - 'This scraper returned an error:' . - '<div class="code">' . htmlspecialchars($error->getMessage()) . '</div>' . - 'Things you can try:' . - '<ul>' . - '<li>Use a different scraper</li>' . - '<li>Remove keywords that could cause errors</li>' . - '<li>Use another 4get instance</li>' . - '</ul><br>' . - 'If the error persists, please <a href="/about">contact the administrator</a>.' - ); - die(); + $frontend->drawscrapererror($error->getMessage(), $get, "videos"); } $categories = [ @@ -3,6 +3,8 @@ /* Initialize random shit */ +include "data/config.php"; + include "lib/frontend.php"; $frontend = new frontend(); @@ -28,20 +30,7 @@ try{ }catch(Exception $error){ - echo - $frontend->drawerror( - "Shit", - 'This scraper returned an error:' . - '<div class="code">' . htmlspecialchars($error->getMessage()) . '</div>' . - 'Things you can try:' . - '<ul>' . - '<li>Use a different scraper</li>' . - '<li>Remove keywords that could cause errors</li>' . - '<li>Use another 4get instance</li>' . - '</ul><br>' . - 'If the error persists, please <a href="/about">contact the administrator</a>.' - ); - die(); + $frontend->drawscrapererror($error->getMessage(), $get, "web"); } /* |