iPromKnight
79a6aa3cb0
Improve producer matching - Add tissue service
...
Tissue service will sanitize the existign database of ingested torrents by matching existing titles with new banned word lists. Now with added kleenex
2024-03-12 10:29:13 +00:00
iPromKnight
aeb83c19f8
Simplification of parsing in consumer
...
should speed up massively especially if imdbIds are found from mongodb
2024-03-11 10:56:04 +00:00
iPromKnight
5c310427b4
Fix nyaa category
2024-03-11 08:59:55 +00:00
iPromKnight
02150482df
reduce cpu cycles in parsing in producer
2024-03-10 15:14:17 +00:00
iPromKnight
2e774058ff
Few extra terms getting through
2024-03-10 14:54:25 +00:00
iPromKnight
ad04d323b4
remove log line of adult content
2024-03-10 13:54:35 +00:00
iPromKnight
e2b45e799d
[skip ci] Remove Debug logged adult terms found
2024-03-10 13:49:51 +00:00
iPromKnight
6c03f79933
Complete
2024-03-10 13:48:27 +00:00
iPromKnight
320fccc8e8
[skip ci] More work on parsing - seasons to fix still and use banned words
2024-03-10 12:48:19 +00:00
iPromKnight
8d82a17876
re-disable services other than dmm while developing
...
re-enable
disable again - will squash dont worry
enable again
disable again
2024-03-10 12:48:19 +00:00
iPromKnight
f719520b3b
[skip ci] Ignore all run profiles to prevent pat leaking
...
reenable these, testing only producer should build
2024-03-10 12:48:19 +00:00
iPromKnight
6600fceb1a
Wip Blacklisting dmm porn
...
Create adult text classifier ML Model
wip - starting to write PTN in c#
More work on season, show and movie parsing
Remove ML project
2024-03-10 12:48:16 +00:00
purple_emily
79409915cf
Run pre-commit
2024-03-08 14:34:53 +00:00
iPromKnight
a609af66f9
change default retry window to be a larger delay
...
Lets not hammer them
2024-03-03 20:28:53 +00:00
iPromKnight
c3a281c39f
retry polic and circuit breaker policy
2024-03-03 19:54:32 +00:00
iPromKnight
62decbf994
Ensure we throw
...
when torrentio/knightcrawler instances return invalid status codes on fetch requests for json payloads, pre-parsing of json, polly will catch in the policy wrapped resiliency handler
2024-03-03 19:30:06 +00:00
iPromKnight
c61e9e94e1
Rethrow so polly captures failures on requests.
2024-03-03 19:22:10 +00:00
iPromKnight
d8f48fcee9
Introduce a circuit breaker, also exit out of loop if mongo failures.
2024-03-03 16:19:56 +00:00
iPromKnight
4b3bb2b5bd
hotfix continue, not break - add slight delay - log params
2024-03-03 04:10:14 +00:00
iPromKnight
95fa48c851
Woke up to see a discussion about torrentio scraping: powered by community
...
Was a little inspired. Now we have a database (self populating) of imdb id's - why shouldn't we actually have the ability to scrape any other instance of torrentio, or knightcrawler?
Also restructured the producer to be vertically sliced to make it easier to work with
Too much flicking back and forth between Jobs and Crawlers when configuring
2024-03-02 18:41:57 +00:00
iPromKnight
1b9a01c677
BREAKING: Cleanup RabbitMQ env vars, and Github Pat
2024-02-28 12:57:55 +00:00
Gabisonfire
6c4282b6de
Adds Nyaa Crawler
2024-02-27 10:08:39 -05:00
iPromKnight
49a6283f26
Fix DMM so that all pages are enumerated
...
Fixes #95 by switching to git trees instead of the content api.
2024-02-27 13:51:21 +00:00
David Howell
2cae5296a2
Build multi-platform images
...
Refactor GitHub Actions workflow for build
Run Dockle and Trivy, upload sarif reports to GitHub
Refactor Dockerfiles based on best practices
2024-02-08 06:00:48 +00:00
iPromKnight
e461e26b0f
Change postgres configuration in the producer to use the env vars from the stack
2024-02-04 15:03:07 +00:00
iPromKnight
57f4757541
Implement Max Queue and Max Batch size when publishing
...
MaxPublishBatchSize must be set, but MaxQueueSize can be set to 0 to disable check of the rabbitmq queue size
2024-02-02 14:43:29 +00:00
iPromKnight
68edaba308
Introduce max batch size, and configurable publish window
...
Still need to implement queue size limit
Also fixes env var consistency between addon and consumer
2024-02-02 13:49:54 +00:00
iPromKnight
ee994fc8be
ignore bin and obj
2024-02-01 16:47:45 +00:00
iPromKnight
ab17ef81be
Big rewrite - distributed consumers for ingestion / scraping(scalable) - single producer written in c#.
...
Changed from page scraping to rss xml scraping
Includes RealDebridManager hashlist decoding (requires a github readonly PAT as requests must be authenticated) - This allows ingestion of 200k+ entries in a few hours.
Simplifies a lot of torrentio to deal with new data
2024-02-01 16:38:45 +00:00