22 Commits

Author SHA1 Message Date
Giorgi
594320ed63 Fix DMM parser and a couple of other minor issues. (#226) 2024-11-19 19:00:06 +00:00
Gabisonfire
a7d5944d25 Adds missing body parameter (#232) 2024-11-15 08:35:46 -05:00
Gabisonfire
c053a5f8da Automate pull request for changelog (#231) 2024-11-15 08:34:09 -05:00
Gabisonfire
5611d3776f [skip ci] Update changelog (#230)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2024-11-15 08:27:29 -05:00
Gabisonfire
833ac11a96 Update git_cliff.yml (#229) 2024-11-15 08:26:33 -05:00
Gabisonfire
16d8707c48 Commit git cliff to the proper branch (#228) 2024-11-15 08:16:24 -05:00
Gabisonfire
6dfbaa4739 Fix git cliff branch protection issues (#227) 2024-11-15 08:11:50 -05:00
FunkeCoder23
03b5617312 Update release workflow (#219)
* add bare-bones contributing guide

* remove GITLAB_PAT token instructions as unneeded

* add git cliff workflow and config

add git cliff information to contributing.md

squash

---------

Co-authored-by: funkecoder23 <funkecoder@DESKTOP-AMVOBPG>
2024-11-15 07:39:15 -05:00
David Young
19cb42af77 Update all-debrid-api to v1.2.0 for new IP requirement (#212)
Signed-off-by: David Young <davidy@funkypenguin.co.nz>
2024-11-15 07:29:53 -05:00
FunkeCoder23
9344531b34 Add longer timeout for DMM (#218) 2024-07-01 11:40:09 +01:00
FunkeCoder23
723aa6b6a0 update eztv url (#220)
Co-authored-by: funkecoder23 <funkecoder@DESKTOP-AMVOBPG>
2024-05-18 21:39:14 -04:00
Davide Marcoli
e17b476801 Re-add filter step in the processing of the streams (#215) 2024-05-15 12:00:02 +01:00
David Young
2a414d8bc0 Always return title and filename (#210)
Signed-off-by: David Young <davidy@funkypenguin.co.nz>
2024-05-15 11:55:42 +01:00
iPromKnight
9b5f454e6e Python version bump in alpine (#209) 2024-04-22 12:49:59 +01:00
iPromKnight
ad9549c695 Version bump for release (#208) 2024-04-22 12:46:02 +01:00
David Young
1e85cb00ff INNER JOIN when selecting files and torrents to avoid null results (#207)
* INNER JOIN when selecting files and torrents to avoid null results

Signed-off-by: David Young <davidy@funkypenguin.co.nz>

* Extend fix to all torrent types

Signed-off-by: David Young <davidy@funkypenguin.co.nz>

---------

Signed-off-by: David Young <davidy@funkypenguin.co.nz>
2024-04-22 12:43:57 +01:00
iPromKnight
da640a4071 Fix namespaces on extracted scraper info (#204)
* Fix namespaces on extracted scrapers

* version bump
2024-04-11 18:56:29 +01:00
iPromKnight
e6a63fd72e Allow configuration of producer urls (#203)
* Allow configuration of urls in scrapers by mounting the scrapers.json file over the one in the container

* version bump
2024-04-11 18:23:42 +01:00
iPromKnight
02101ac50a Allow qbit concurrency to be configurable (#200) 2024-04-11 18:02:29 +01:00
iPromKnight
3c8ffd5082 Fix Duplicates (#199)
* Fix Duplicates

* Version
2024-04-02 20:31:22 +01:00
iPromKnight
79e0a0f102 DMM Offline (#198)
* Process DMM all locally

single call to github to download the repo archive.
remove need for PAT
update RTN to 0.2.13
change to batch_parse for title parsing from RTN

* introduce concurrent dictionary, and parallelism
2024-04-02 17:01:22 +01:00
purple_emily
6181207513 Fix incorrect file index stored (#197)
* Fix incorrect file index stored

* Update `rank-torrent-name` to latest version

* Knight Crawler version update
2024-04-01 23:08:32 +01:00
57 changed files with 768 additions and 290 deletions

View File

@@ -12,6 +12,9 @@ A clear and concise description of what the bug is.
**To Reproduce**
Steps to reproduce the behavior:
1.
2.
3.
**Expected behavior**
A clear and concise description of what you expected to happen.
@@ -23,6 +26,7 @@ If the logs are short, make sure to triple backtick them, or use https://pastebi
**Hardware:**
- OS and distro: [e.g. Raspberry Pi OS, Ubuntu, Rocky]
- Server: [e.g. VM, Baremetal, Pi]
- Knightcrawler Version: [2.0.xx]
**Additional context**
Add any other context about the problem here.

39
.github/workflows/git_cliff.yml vendored Normal file
View File

@@ -0,0 +1,39 @@
on:
push:
branches:
- main
workflow_dispatch:
jobs:
changelog:
name: Generate changelog
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Generate a changelog
uses: orhun/git-cliff-action@v3
with:
config: cliff.toml
args: --verbose
env:
OUTPUT: CHANGELOG.md
GITHUB_REPO: ${{ github.repository }}
- name: Commit
run: |
git config user.name 'github-actions[bot]'
git config user.email 'github-actions[bot]@users.noreply.github.com'
set +e
git checkout -b feat/changelog_$(date +"%d_%m")
git add CHANGELOG.md
git commit -m "[skip ci] Update changelog"
git push https://${{ secrets.GITHUB_TOKEN }}@github.com/${{ github.repository }}.git feat/changelog_$(date +"%d_%m")
- name: create pull request
run: gh pr create -B main -H feat/changelog_$(date +"%d_%m") --title '[skip ci] Update changelog' --body 'Changelog update by git-cliff'
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

22
CHANGELOG.md Normal file
View File

@@ -0,0 +1,22 @@
# Changelog
All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [1.0.0] - 2024-03-25
### Details
#### Changed
- Change POSTGRES_USERNAME to POSTGRES_USER. Oops by @purple-emily
- Change POSTGRES_DATABASE to POSTGRES_DB by @purple-emily
- Two movie commands instead of movie and tv by @purple-emily
- Cleanup RabbitMQ env vars, and Github Pat by @iPromKnight
#### Fixed
- HRD -> HDR by @mplewis
## New Contributors
* @mplewis made their first contribution
<!-- generated by git-cliff -->

34
CONTRIBUTING.md Normal file
View File

@@ -0,0 +1,34 @@
We use [Meaningful commit messages](https://reflectoring.io/meaningful-commit-messages/)
Tl;dr:
1. It should answer the question: “What happens if the changes are applied?".
2. Use the imperative, present tense. It is easier to read and scan quickly:
```
Right: Add feature to alert admin for new user registration
Wrong: Added feature ... (past tense)
```
3. The summary should always be able to complete the following sentence:
`If applied, this commit will… `
We use [git-cliff] for our changelog.
The breaking flag is set to true when the commit has an exclamation mark after the commit type and scope, e.g.:
`feat(scope)!: this is a breaking change`
Keywords (Commit messages should start with these):
```
# Added
add
support
# Removed
remove
delete
# Fixed
test
fix
```
Any other commits will fall under the `Changed` category
This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html)

View File

@@ -67,30 +67,6 @@ Then set any of the values you wouldd like to customize.
By default, Knight Crawler is configured to be *relatively* conservative in its resource usage. If running on a decent machine (16GB RAM, i5+ or equivalent), you can increase some settings to increase consumer throughput. This is especially helpful if you have a large backlog from [importing databases](#importing-external-dumps).
### DebridMediaManager setup (optional)
There are some optional steps you should take to maximise the number of movies/tv shows we can find.
We can search DebridMediaManager hash lists which are hosted on GitHub. This allows us to add hundreds of thousands of movies and tv shows, but it requires a Personal Access Token to be generated. The software only needs read access and only for public repositories. To generate one, please follow these steps:
1. Navigate to GitHub settings -> Developer Settings -> Personal access tokens -> Fine-grained tokens (click [here](https://github.com/settings/tokens?type=beta) for a direct link)
2. Press `Generate new token`
3. Fill out the form (example data below):
```
Token name:
KnightCrawler
Expiration:
90 days
Description:
<blank>
Repository access
(checked) Public Repositories (read-only)
```
4. Click `Generate token`
5. Take the new token and add it to the bottom of the [stack.env](deployment/docker/stack.env) file
```
GITHUB_PAT=<YOUR TOKEN HERE>
```
### Configure external access
Please choose which applies to you:

112
cliff.toml Normal file
View File

@@ -0,0 +1,112 @@
# git-cliff ~ configuration file
# https://git-cliff.org/docs/configuration
[changelog]
# changelog header
header = """
# Changelog\n
All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).\n
"""
# template for the changelog body
# https://keats.github.io/tera/docs/#introduction
body = """
{%- macro remote_url() -%}
https://github.com/{{ remote.github.owner }}/{{ remote.github.repo }}
{%- endmacro -%}
{% if version -%}
## [{{ version | trim_start_matches(pat="v") }}] - {{ timestamp | date(format="%Y-%m-%d") }}
{% else -%}
## [Unreleased]
{% endif -%}
### Details\
{% for group, commits in commits | group_by(attribute="group") %}
#### {{ group | upper_first }}
{%- for commit in commits %}
- {{ commit.message | upper_first | trim }}\
{% if commit.github.username %} by @{{ commit.github.username }}{%- endif -%}
{% if commit.github.pr_number %} in \
[#{{ commit.github.pr_number }}]({{ self::remote_url() }}/pull/{{ commit.github.pr_number }}) \
{%- endif -%}
{% endfor %}
{% endfor %}
{%- if github.contributors | filter(attribute="is_first_time", value=true) | length != 0 %}
## New Contributors
{%- endif -%}
{% for contributor in github.contributors | filter(attribute="is_first_time", value=true) %}
* @{{ contributor.username }} made their first contribution
{%- if contributor.pr_number %} in \
[#{{ contributor.pr_number }}]({{ self::remote_url() }}/pull/{{ contributor.pr_number }}) \
{%- endif %}
{%- endfor %}\n
"""
# template for the changelog footer
footer = """
{%- macro remote_url() -%}
https://github.com/{{ remote.github.owner }}/{{ remote.github.repo }}
{%- endmacro -%}
{% for release in releases -%}
{% if release.version -%}
{% if release.previous.version -%}
[{{ release.version | trim_start_matches(pat="v") }}]: \
{{ self::remote_url() }}/compare/{{ release.previous.version }}..{{ release.version }}
{% endif -%}
{% else -%}
[unreleased]: {{ self::remote_url() }}/compare/{{ release.previous.version }}..HEAD
{% endif -%}
{% endfor %}
<!-- generated by git-cliff -->
"""
# remove the leading and trailing whitespace from the templates
trim = true
[git]
# parse the commits based on https://www.conventionalcommits.org
conventional_commits = true
# filter out the commits that are not conventional
filter_unconventional = true
# process each line of a commit as an individual commit
split_commits = false
# regex for preprocessing the commit messages
commit_preprocessors = [
# remove issue numbers from commits
{ pattern = '\((\w+\s)?#([0-9]+)\)', replace = "" },
]
# regex for parsing and grouping commits
commit_parsers = [
{ message = "^.*: add", group = "Added" },
{ message = "^add", group = "Added" },
{ message = "^.*: support", group = "Added" },
{ message = "^support", group = "Added" },
{ message = "^.*: remove", group = "Removed" },
{ message = "^remove", group = "Removed" },
{ message = "^.*: delete", group = "Removed" },
{ message = "^delete", group = "Removed" },
{ message = "^.*: test", group = "Fixed" },
{ message = "^test", group = "Fixed" },
{ message = "^.*: fix", group = "Fixed" },
{ message = "^fix", group = "Fixed" },
{ message = "^.*", group = "Changed" },
]
# protect breaking changes from being skipped due to matching a skipping commit_parser
protect_breaking_commits = false
# filter out the commits that are not matched by commit parsers
filter_commits = true
# regex for matching git tags
tag_pattern = "v[0-9].*"
# regex for skipping tags
skip_tags = "v0.1.0-beta.1"
# regex for ignoring tags
ignore_tags = ""
# sort the tags topologically
topo_order = false
# sort the commits inside sections by oldest/newest order
sort_commits = "oldest"

View File

@@ -14,12 +14,14 @@ program=
[BitTorrent]
Session\AnonymousModeEnabled=true
Session\BTProtocol=TCP
Session\ConnectionSpeed=150
Session\DefaultSavePath=/downloads/
Session\ExcludedFileNames=
Session\MaxActiveCheckingTorrents=5
Session\MaxActiveDownloads=10
Session\MaxActiveCheckingTorrents=20
Session\MaxActiveDownloads=20
Session\MaxActiveTorrents=50
Session\MaxActiveUploads=50
Session\MaxConcurrentHTTPAnnounces=1000
Session\MaxConnections=2000
Session\Port=6881
Session\QueueingSystemEnabled=true

View File

@@ -94,7 +94,7 @@ services:
condition: service_healthy
env_file: stack.env
hostname: knightcrawler-addon
image: gabisonfire/knightcrawler-addon:2.0.18
image: gabisonfire/knightcrawler-addon:2.0.26
labels:
logging: promtail
networks:
@@ -117,7 +117,7 @@ services:
redis:
condition: service_healthy
env_file: stack.env
image: gabisonfire/knightcrawler-consumer:2.0.18
image: gabisonfire/knightcrawler-consumer:2.0.26
labels:
logging: promtail
networks:
@@ -138,7 +138,7 @@ services:
redis:
condition: service_healthy
env_file: stack.env
image: gabisonfire/knightcrawler-debrid-collector:2.0.18
image: gabisonfire/knightcrawler-debrid-collector:2.0.26
labels:
logging: promtail
networks:
@@ -152,7 +152,7 @@ services:
migrator:
condition: service_completed_successfully
env_file: stack.env
image: gabisonfire/knightcrawler-metadata:2.0.18
image: gabisonfire/knightcrawler-metadata:2.0.26
networks:
- knightcrawler-network
restart: "no"
@@ -163,7 +163,7 @@ services:
postgres:
condition: service_healthy
env_file: stack.env
image: gabisonfire/knightcrawler-migrator:2.0.18
image: gabisonfire/knightcrawler-migrator:2.0.26
networks:
- knightcrawler-network
restart: "no"
@@ -182,7 +182,7 @@ services:
redis:
condition: service_healthy
env_file: stack.env
image: gabisonfire/knightcrawler-producer:2.0.18
image: gabisonfire/knightcrawler-producer:2.0.26
labels:
logging: promtail
networks:
@@ -207,7 +207,7 @@ services:
deploy:
replicas: ${QBIT_REPLICAS:-0}
env_file: stack.env
image: gabisonfire/knightcrawler-qbit-collector:2.0.18
image: gabisonfire/knightcrawler-qbit-collector:2.0.26
labels:
logging: promtail
networks:

View File

@@ -20,7 +20,7 @@ x-depends: &knightcrawler-app-depends
services:
metadata:
image: gabisonfire/knightcrawler-metadata:2.0.18
image: gabisonfire/knightcrawler-metadata:2.0.26
env_file: ../../.env
networks:
- knightcrawler-network
@@ -30,7 +30,7 @@ services:
condition: service_completed_successfully
migrator:
image: gabisonfire/knightcrawler-migrator:2.0.18
image: gabisonfire/knightcrawler-migrator:2.0.26
env_file: ../../.env
networks:
- knightcrawler-network
@@ -40,7 +40,7 @@ services:
condition: service_healthy
addon:
image: gabisonfire/knightcrawler-addon:2.0.18
image: gabisonfire/knightcrawler-addon:2.0.26
<<: [*knightcrawler-app, *knightcrawler-app-depends]
restart: unless-stopped
hostname: knightcrawler-addon
@@ -48,22 +48,22 @@ services:
- "7000:7000"
consumer:
image: gabisonfire/knightcrawler-consumer:2.0.18
image: gabisonfire/knightcrawler-consumer:2.0.26
<<: [*knightcrawler-app, *knightcrawler-app-depends]
restart: unless-stopped
debridcollector:
image: gabisonfire/knightcrawler-debrid-collector:2.0.18
image: gabisonfire/knightcrawler-debrid-collector:2.0.26
<<: [*knightcrawler-app, *knightcrawler-app-depends]
restart: unless-stopped
producer:
image: gabisonfire/knightcrawler-producer:2.0.18
image: gabisonfire/knightcrawler-producer:2.0.26
<<: [*knightcrawler-app, *knightcrawler-app-depends]
restart: unless-stopped
qbitcollector:
image: gabisonfire/knightcrawler-qbit-collector:2.0.18
image: gabisonfire/knightcrawler-qbit-collector:2.0.26
<<: [*knightcrawler-app, *knightcrawler-app-depends]
restart: unless-stopped
depends_on:

View File

@@ -32,12 +32,10 @@ COLLECTOR_DEBRID_ENABLED=true
COLLECTOR_REAL_DEBRID_API_KEY=
QBIT_HOST=http://qbittorrent:8080
QBIT_TRACKERS_URL=https://raw.githubusercontent.com/ngosang/trackerslist/master/trackers_all_http.txt
QBIT_CONCURRENCY=8
# Number of replicas for the qBittorrent collector and qBitTorrent client. Should be 0 or 1.
QBIT_REPLICAS=0
# Addon
DEBUG_MODE=false
# Producer
GITHUB_PAT=

View File

@@ -12,7 +12,7 @@
"@redis/client": "^1.5.14",
"@redis/json": "^1.0.6",
"@redis/search": "^1.1.6",
"all-debrid-api": "^1.1.0",
"all-debrid-api": "^1.2.0",
"axios": "^1.6.1",
"bottleneck": "^2.19.5",
"cache-manager": "^3.4.4",

View File

@@ -10,7 +10,7 @@
},
"dependencies": {
"@putdotio/api-client": "^8.42.0",
"all-debrid-api": "^1.1.0",
"all-debrid-api": "^1.2.0",
"axios": "^1.6.1",
"bottleneck": "^2.19.5",
"cache-manager": "^3.4.4",

View File

@@ -3,6 +3,7 @@ import { addonBuilder } from 'stremio-addon-sdk';
import { cacheWrapStream } from './lib/cache.js';
import { dummyManifest } from './lib/manifest.js';
import * as repository from './lib/repository.js';
import applyFilters from "./lib/filter.js";
import applySorting from './lib/sort.js';
import { toStreamInfo, applyStaticInfo } from './lib/streamInfo.js';
import { Type } from './lib/types.js';
@@ -32,6 +33,7 @@ builder.defineStreamHandler((args) => {
.then(records => records
.sort((a, b) => b.torrent.seeders - a.torrent.seeders || b.torrent.uploadDate - a.torrent.uploadDate)
.map(record => toStreamInfo(record)))))
.then(streams => applyFilters(streams, args.extra))
.then(streams => applySorting(streams, args.extra))
.then(streams => applyStaticInfo(streams))
.then(streams => applyMochs(streams, args.extra))

View File

@@ -84,7 +84,7 @@ export function getImdbIdMovieEntries(imdbId) {
where: {
imdbId: { [Op.eq]: imdbId }
},
include: [Torrent],
include: { model: Torrent, required: true },
limit: 500,
order: [
[Torrent, 'size', 'DESC']
@@ -99,7 +99,7 @@ export function getImdbIdSeriesEntries(imdbId, season, episode) {
imdbSeason: { [Op.eq]: season },
imdbEpisode: { [Op.eq]: episode }
},
include: [Torrent],
include: { model: Torrent, required: true },
limit: 500,
order: [
[Torrent, 'size', 'DESC']
@@ -112,7 +112,7 @@ export function getKitsuIdMovieEntries(kitsuId) {
where: {
kitsuId: { [Op.eq]: kitsuId }
},
include: [Torrent],
include: { model: Torrent, required: true },
limit: 500,
order: [
[Torrent, 'size', 'DESC']
@@ -126,7 +126,7 @@ export function getKitsuIdSeriesEntries(kitsuId, episode) {
kitsuId: { [Op.eq]: kitsuId },
kitsuEpisode: { [Op.eq]: episode }
},
include: [Torrent],
include: { model: Torrent, required: true },
limit: 500,
order: [
[Torrent, 'size', 'DESC']

View File

@@ -20,7 +20,7 @@ export function toStreamInfo(record) {
const title = joinDetailParts(
[
joinDetailParts([record.torrent.title.replace(/[, ]+/g, ' ')]),
joinDetailParts([!sameInfo && record.title || undefined]),
joinDetailParts([record.title || undefined]),
joinDetailParts([
joinDetailParts([formatSize(record.size)], '💾 ')
]),

View File

@@ -15,7 +15,7 @@ WORKDIR /app
ENV PYTHONUNBUFFERED=1
RUN apk add --update --no-cache python3=~3.11.8-r0 py3-pip && ln -sf python3 /usr/bin/python
RUN apk add --update --no-cache python3=~3.11 py3-pip && ln -sf python3 /usr/bin/python
COPY --from=build /src/out .

View File

@@ -16,13 +16,14 @@ public static class DebridMetaToTorrentMeta
foreach (var metadataEntry in Metadata.Where(m => Filetypes.VideoFileExtensions.Any(ext => m.Value.Filename.EndsWith(ext))))
{
var validFileIndex = int.TryParse(metadataEntry.Key, out var fileIndex);
var fileIndexMinusOne = Math.Max(0, fileIndex - 1);
var file = new TorrentFile
{
ImdbId = ImdbId,
KitsuId = 0,
InfoHash = torrent.InfoHash,
FileIndex = validFileIndex ? fileIndex : 0,
FileIndex = validFileIndex ? fileIndexMinusOne : 0,
Title = metadataEntry.Value.Filename,
Size = metadataEntry.Value.Filesize.GetValueOrDefault(),
};
@@ -66,13 +67,14 @@ public static class DebridMetaToTorrentMeta
foreach (var metadataEntry in Metadata.Where(m => Filetypes.SubtitleFileExtensions.Any(ext => m.Value.Filename.EndsWith(ext))))
{
var validFileIndex = int.TryParse(metadataEntry.Key, out var fileIndex);
var fileIndexMinusOne = Math.Max(0, fileIndex - 1);
var fileId = torrentFiles.FirstOrDefault(
t => Path.GetFileNameWithoutExtension(t.Title) == Path.GetFileNameWithoutExtension(metadataEntry.Value.Filename))?.Id ?? 0;
var file = new SubtitleFile
{
InfoHash = InfoHash,
FileIndex = validFileIndex ? fileIndex : 0,
FileIndex = validFileIndex ? fileIndexMinusOne : 0,
FileId = fileId,
Title = metadataEntry.Value.Filename,
};

View File

@@ -1 +1 @@
rank-torrent-name==0.2.5
rank-torrent-name==0.2.13

View File

@@ -8,12 +8,14 @@ public class PostgresConfiguration
private const string PasswordVariable = "PASSWORD";
private const string DatabaseVariable = "DB";
private const string PortVariable = "PORT";
private const string CommandTimeoutVariable = "COMMAND_TIMEOUT_SEC"; // Seconds
private string Host { get; init; } = Prefix.GetRequiredEnvironmentVariableAsString(HostVariable);
private string Username { get; init; } = Prefix.GetRequiredEnvironmentVariableAsString(UsernameVariable);
private string Password { get; init; } = Prefix.GetRequiredEnvironmentVariableAsString(PasswordVariable);
private string Database { get; init; } = Prefix.GetRequiredEnvironmentVariableAsString(DatabaseVariable);
private int PORT { get; init; } = Prefix.GetEnvironmentVariableAsInt(PortVariable, 5432);
private int CommandTimeout { get; init; } = Prefix.GetEnvironmentVariableAsInt(CommandTimeoutVariable, 300);
public string StorageConnectionString => $"Host={Host};Port={PORT};Username={Username};Password={Password};Database={Database};";
public string StorageConnectionString => $"Host={Host};Port={PORT};Username={Username};Password={Password};Database={Database};CommandTimeout={CommandTimeout}";
}

View File

@@ -134,7 +134,7 @@ public class ImdbDbService(PostgresConfiguration configuration, ILogger<ImdbDbSe
{
try
{
await using var connection = CreateNpgsqlConnection();
await using var connection = new NpgsqlConnection(configuration.StorageConnectionString);
await connection.OpenAsync();
await operation(connection);
@@ -145,16 +145,6 @@ public class ImdbDbService(PostgresConfiguration configuration, ILogger<ImdbDbSe
}
}
private NpgsqlConnection CreateNpgsqlConnection()
{
var connectionStringBuilder = new NpgsqlConnectionStringBuilder(configuration.StorageConnectionString)
{
CommandTimeout = 3000,
};
return new(connectionStringBuilder.ConnectionString);
}
private async Task ExecuteCommandWithTransactionAsync(Func<NpgsqlConnection, NpgsqlTransaction, Task> operation, NpgsqlTransaction transaction, string errorMessage)
{
try

View File

@@ -13,7 +13,7 @@
<PackageReference Include="Dapper" Version="2.1.35" />
<PackageReference Include="Microsoft.Extensions.Hosting" Version="8.0.0" />
<PackageReference Include="Microsoft.Extensions.Http" Version="8.0.0" />
<PackageReference Include="Npgsql" Version="8.0.2" />
<PackageReference Include="Npgsql" Version="8.0.3" />
<PackageReference Include="Serilog" Version="3.1.1" />
<PackageReference Include="Serilog.AspNetCore" Version="8.0.1" />
<PackageReference Include="Serilog.Sinks.Console" Version="5.0.1" />

View File

@@ -0,0 +1,43 @@
-- Drop Duplicate Files in Files Table
DELETE FROM public.files
WHERE id NOT IN (
SELECT MAX(id)
FROM public.files
GROUP BY "infoHash", "fileIndex"
);
-- Add Index to files table
DO $$
BEGIN
IF NOT EXISTS (
SELECT 1
FROM pg_constraint
WHERE conname = 'files_unique_infohash_fileindex'
) THEN
ALTER TABLE public.files
ADD CONSTRAINT files_unique_infohash_fileindex UNIQUE ("infoHash", "fileIndex");
END IF;
END $$;
-- Drop Duplicate subtitles in Subtitles Table
DELETE FROM public.subtitles
WHERE id NOT IN (
SELECT MAX(id)
FROM public.subtitles
GROUP BY "infoHash", "fileIndex"
);
-- Add Index to subtitles table
DO $$
BEGIN
IF NOT EXISTS (
SELECT 1
FROM pg_constraint
WHERE conname = 'subtitles_unique_infohash_fileindex'
) THEN
ALTER TABLE public.subtitles
ADD CONSTRAINT subtitles_unique_infohash_fileindex UNIQUE ("infoHash", "fileIndex");
END IF;
END $$;

View File

@@ -4,31 +4,38 @@
{
"Name": "SyncEzTvJob",
"IntervalSeconds": 60,
"Enabled": true
"Enabled": true,
"Url": "https://eztvx.to/ezrss.xml",
"XmlNamespace": "http://xmlns.ezrss.it/0.1/"
},
{
"Name": "SyncNyaaJob",
"IntervalSeconds": 60,
"Enabled": true
"Enabled": true,
"Url": "https://nyaa.si/?page=rss&c=1_2&f=0",
"XmlNamespace": "https://nyaa.si/xmlns/nyaa"
},
{
"Name": "SyncTpbJob",
"IntervalSeconds": 60,
"Enabled": true
"Enabled": true,
"Url": "https://apibay.org/precompiled/data_top100_recent.json"
},
{
"Name": "SyncYtsJob",
"IntervalSeconds": 60,
"Enabled": true
"Enabled": true,
"Url": "https://yts.am/rss"
},
{
"Name": "SyncTgxJob",
"IntervalSeconds": 60,
"Enabled": true
"Enabled": true,
"Url": "https://tgx.rs/rss"
},
{
"Name": "SyncDmmJob",
"IntervalSeconds": 1800,
"IntervalSeconds": 10800,
"Enabled": true
},
{

View File

@@ -14,7 +14,7 @@ WORKDIR /app
ENV PYTHONUNBUFFERED=1
RUN apk add --update --no-cache python3=~3.11.8-r0 py3-pip && ln -sf python3 /usr/bin/python
RUN apk add --update --no-cache python3=~3.11 py3-pip && ln -sf python3 /usr/bin/python
COPY --from=build /src/out .

View File

@@ -6,6 +6,12 @@ public abstract class BaseJsonCrawler(IHttpClientFactory httpClientFactory, ILog
protected virtual async Task Execute(string collectionName)
{
if (string.IsNullOrWhiteSpace(Url))
{
logger.LogWarning("No URL provided for {Source} crawl", Source);
return;
}
logger.LogInformation("Starting {Source} crawl", Source);
using var client = httpClientFactory.CreateClient("Scraper");

View File

@@ -4,6 +4,12 @@ public abstract class BaseXmlCrawler(IHttpClientFactory httpClientFactory, ILogg
{
public override async Task Execute()
{
if (string.IsNullOrWhiteSpace(Url))
{
logger.LogWarning("No URL provided for {Source} crawl", Source);
return;
}
logger.LogInformation("Starting {Source} crawl", Source);
using var client = httpClientFactory.CreateClient(Literals.CrawlerClient);

View File

@@ -7,4 +7,8 @@ public class Scraper
public int IntervalSeconds { get; set; } = 60;
public bool Enabled { get; set; } = true;
public string? Url { get; set; }
public string? XmlNamespace { get; set; }
}

View File

@@ -0,0 +1,70 @@
namespace Producer.Features.Crawlers.Dmm;
public class DMMFileDownloader(HttpClient client, ILogger<DMMFileDownloader> logger) : IDMMFileDownloader
{
private const string Filename = "main.zip";
private readonly IReadOnlyCollection<string> _filesToIgnore = [
"index.html",
"404.html",
"dedupe.sh",
"CNAME",
];
public const string ClientName = "DmmFileDownloader";
public async Task<string> DownloadFileToTempPath(CancellationToken cancellationToken)
{
logger.LogInformation("Downloading DMM Hashlists");
var response = await client.GetAsync(Filename, cancellationToken);
var tempDirectory = Path.Combine(Path.GetTempPath(), "DMMHashlists");
EnsureDirectoryIsClean(tempDirectory);
response.EnsureSuccessStatusCode();
await using var stream = await response.Content.ReadAsStreamAsync(cancellationToken);
using var archive = new ZipArchive(stream);
logger.LogInformation("Extracting DMM Hashlists to {TempDirectory}", tempDirectory);
foreach (var entry in archive.Entries)
{
var entryPath = Path.Combine(tempDirectory, Path.GetFileName(entry.FullName));
if (!entry.FullName.EndsWith('/')) // It's a file
{
entry.ExtractToFile(entryPath, true);
}
}
foreach (var file in _filesToIgnore)
{
CleanRepoExtras(tempDirectory, file);
}
logger.LogInformation("Downloaded and extracted Repository to {TempDirectory}", tempDirectory);
return tempDirectory;
}
private static void CleanRepoExtras(string tempDirectory, string fileName)
{
var repoIndex = Path.Combine(tempDirectory, fileName);
if (File.Exists(repoIndex))
{
File.Delete(repoIndex);
}
}
private static void EnsureDirectoryIsClean(string tempDirectory)
{
if (Directory.Exists(tempDirectory))
{
Directory.Delete(tempDirectory, true);
}
Directory.CreateDirectory(tempDirectory);
}
}

View File

@@ -0,0 +1,6 @@
namespace Producer.Features.Crawlers.Dmm;
public class DMMHttpClient
{
}

View File

@@ -1,64 +1,99 @@
namespace Producer.Features.Crawlers.Dmm;
public partial class DebridMediaManagerCrawler(
IHttpClientFactory httpClientFactory,
IDMMFileDownloader dmmFileDownloader,
ILogger<DebridMediaManagerCrawler> logger,
IDataStorage storage,
GithubConfiguration githubConfiguration,
IRankTorrentName rankTorrentName,
IDistributedCache cache) : BaseCrawler(logger, storage)
{
[GeneratedRegex("""<iframe src="https:\/\/debridmediamanager.com\/hashlist#(.*)"></iframe>""")]
private static partial Regex HashCollectionMatcher();
private const string DownloadBaseUrl = "https://raw.githubusercontent.com/debridmediamanager/hashlists/main";
protected override string Url => "";
protected override IReadOnlyDictionary<string, string> Mappings => new Dictionary<string, string>();
protected override string Url => "https://api.github.com/repos/debridmediamanager/hashlists/git/trees/main?recursive=1";
protected override string Source => "DMM";
private const int ParallelismCount = 4;
public override async Task Execute()
{
var client = httpClientFactory.CreateClient("Scraper");
client.DefaultRequestHeaders.Authorization = new("Bearer", githubConfiguration.PAT);
client.DefaultRequestHeaders.UserAgent.ParseAdd("curl");
var tempDirectory = await dmmFileDownloader.DownloadFileToTempPath(CancellationToken.None);
var jsonBody = await client.GetStringAsync(Url);
var files = Directory.GetFiles(tempDirectory, "*.html", SearchOption.AllDirectories);
var json = JsonDocument.Parse(jsonBody);
logger.LogInformation("Found {Files} files to parse", files.Length);
var entriesArray = json.RootElement.GetProperty("tree");
var options = new ParallelOptions { MaxDegreeOfParallelism = ParallelismCount };
logger.LogInformation("Found {Entries} total DMM pages", entriesArray.GetArrayLength());
foreach (var entry in entriesArray.EnumerateArray())
await Parallel.ForEachAsync(files, options, async (file, token) =>
{
await ParsePage(entry, client);
}
var fileName = Path.GetFileName(file);
var torrentDictionary = await ExtractPageContents(file, fileName);
if (torrentDictionary == null)
{
return;
}
await ParseTitlesWithRtn(fileName, torrentDictionary);
var results = await ParseTorrents(torrentDictionary);
if (results.Count <= 0)
{
return;
}
await InsertTorrents(results);
await Storage.MarkPageAsIngested(fileName, token);
});
}
private async Task ParsePage(JsonElement entry, HttpClient client)
private async Task ParseTitlesWithRtn(string fileName, IDictionary<string, DmmContent> page)
{
var (pageIngested, name) = await IsAlreadyIngested(entry);
logger.LogInformation("Parsing titles for {Page}", fileName);
if (string.IsNullOrEmpty(name) || pageIngested)
var batchProcessables = page.Select(value => new RtnBatchProcessable(value.Key, value.Value.Filename)).ToList();
var parsedResponses = rankTorrentName.BatchParse(
batchProcessables.Select<RtnBatchProcessable, string>(bp => bp.Filename).ToList(), trashGarbage: false);
// Filter out unsuccessful responses and match RawTitle to requesting title
var successfulResponses = parsedResponses
.Where(response => response != null && response.Success)
.GroupBy(response => response.Response.RawTitle!)
.ToDictionary(group => group.Key, group => group.First());
var options = new ParallelOptions { MaxDegreeOfParallelism = ParallelismCount };
await Parallel.ForEachAsync(batchProcessables.Select(t => t.InfoHash), options, (infoHash, _) =>
{
return;
}
if (page.TryGetValue(infoHash, out var dmmContent) &&
successfulResponses.TryGetValue(dmmContent.Filename, out var parsedResponse))
{
page[infoHash] = dmmContent with { ParseResponse = parsedResponse };
}
var pageSource = await client.GetStringAsync($"{DownloadBaseUrl}/{name}");
await ExtractPageContents(pageSource, name);
return ValueTask.CompletedTask;
});
}
private async Task ExtractPageContents(string pageSource, string name)
private async Task<ConcurrentDictionary<string, DmmContent>?> ExtractPageContents(string filePath, string filenameOnly)
{
var (pageIngested, name) = await IsAlreadyIngested(filenameOnly);
if (pageIngested)
{
return [];
}
var pageSource = await File.ReadAllTextAsync(filePath);
var match = HashCollectionMatcher().Match(pageSource);
if (!match.Success)
{
logger.LogWarning("Failed to match hash collection for {Name}", name);
await Storage.MarkPageAsIngested(name);
return;
await Storage.MarkPageAsIngested(filenameOnly);
return [];
}
var encodedJson = match.Groups.Values.ElementAtOrDefault(1);
@@ -66,96 +101,123 @@ public partial class DebridMediaManagerCrawler(
if (string.IsNullOrEmpty(encodedJson?.Value))
{
logger.LogWarning("Failed to extract encoded json for {Name}", name);
return;
return [];
}
await ProcessExtractedContentsAsTorrentCollection(encodedJson.Value, name);
var decodedJson = LZString.DecompressFromEncodedURIComponent(encodedJson.Value);
JsonElement arrayToProcess;
try
{
var json = JsonDocument.Parse(decodedJson);
if (json.RootElement.ValueKind == JsonValueKind.Object &&
json.RootElement.TryGetProperty("torrents", out var torrentsProperty) &&
torrentsProperty.ValueKind == JsonValueKind.Array)
{
arrayToProcess = torrentsProperty;
}
else if (json.RootElement.ValueKind == JsonValueKind.Array)
{
arrayToProcess = json.RootElement;
}
else
{
logger.LogWarning("Unexpected JSON format in {Name}", name);
return [];
}
}
catch (Exception ex)
{
logger.LogError("Failed to parse JSON {decodedJson} for {Name}: {Exception}", decodedJson, name, ex);
return [];
}
var torrents = await arrayToProcess.EnumerateArray()
.ToAsyncEnumerable()
.Select(ParsePageContent)
.Where(t => t is not null)
.ToListAsync();
if (torrents.Count == 0)
{
logger.LogWarning("No torrents found in {Name}", name);
await Storage.MarkPageAsIngested(filenameOnly);
return [];
}
var torrentDictionary = torrents
.Where(x => x is not null)
.GroupBy(x => x.InfoHash)
.ToConcurrentDictionary(g => g.Key, g => new DmmContent(g.First().Filename, g.First().Bytes, null));
logger.LogInformation("Parsed {Torrents} torrents for {Name}", torrentDictionary.Count, name);
return torrentDictionary;
}
private async Task ProcessExtractedContentsAsTorrentCollection(string encodedJson, string name)
private async Task<List<IngestedTorrent>> ParseTorrents(IDictionary<string, DmmContent> page)
{
var decodedJson = LZString.DecompressFromEncodedURIComponent(encodedJson);
var ingestedTorrents = new List<IngestedTorrent>();
var json = JsonDocument.Parse(decodedJson);
var options = new ParallelOptions { MaxDegreeOfParallelism = ParallelismCount };
await InsertTorrentsForPage(json);
var result = await Storage.MarkPageAsIngested(name);
if (!result.IsSuccess)
await Parallel.ForEachAsync(page, options, async (kvp, ct) =>
{
logger.LogWarning("Failed to mark page as ingested: [{Error}]", result.Failure.ErrorMessage);
return;
}
var (infoHash, dmmContent) = kvp;
var parsedTorrent = dmmContent.ParseResponse;
if (parsedTorrent is not { Success: true })
{
return;
}
logger.LogInformation("Successfully marked page as ingested");
var torrentType = parsedTorrent.Response.IsMovie ? "movie" : "tvSeries";
var cacheKey = GetCacheKey(torrentType, parsedTorrent.Response.ParsedTitle, parsedTorrent.Response.Year);
var (cached, cachedResult) = await CheckIfInCacheAndReturn(cacheKey);
if (cached)
{
logger.LogInformation("[{ImdbId}] Found cached imdb result for {Title}", cachedResult.ImdbId, parsedTorrent.Response.ParsedTitle);
lock (ingestedTorrents)
{
ingestedTorrents.Add(MapToTorrent(cachedResult, dmmContent.Bytes, infoHash, parsedTorrent));
}
return;
}
int? year = parsedTorrent.Response.Year != 0 ? parsedTorrent.Response.Year : null;
var imdbEntry = await Storage.FindImdbMetadata(parsedTorrent.Response.ParsedTitle, torrentType, year, ct);
if (imdbEntry is null)
{
return;
}
await AddToCache(cacheKey, imdbEntry);
logger.LogInformation("[{ImdbId}] Found best match for {Title}: {BestMatch} with score {Score}", imdbEntry.ImdbId, parsedTorrent.Response.ParsedTitle, imdbEntry.Title, imdbEntry.Score);
lock (ingestedTorrents)
{
ingestedTorrents.Add(MapToTorrent(imdbEntry, dmmContent.Bytes, infoHash, parsedTorrent));
}
});
return ingestedTorrents;
}
private async Task<IngestedTorrent?> ParseTorrent(JsonElement item)
{
if (!item.TryGetProperty("filename", out var filenameElement) ||
!item.TryGetProperty("bytes", out var bytesElement) ||
!item.TryGetProperty("hash", out var hashElement))
{
return null;
}
var torrentTitle = filenameElement.GetString();
if (torrentTitle.IsNullOrEmpty())
{
return null;
}
var parsedTorrent = rankTorrentName.Parse(torrentTitle);
if (!parsedTorrent.Success)
{
return null;
}
var torrentType = parsedTorrent.Response.IsMovie ? "movie" : "tvSeries";
var cacheKey = GetCacheKey(torrentType, parsedTorrent.Response.ParsedTitle, parsedTorrent.Response.Year);
var (cached, cachedResult) = await CheckIfInCacheAndReturn(cacheKey);
if (cached)
{
logger.LogInformation("[{ImdbId}] Found cached imdb result for {Title}", cachedResult.ImdbId, parsedTorrent.Response.ParsedTitle);
return MapToTorrent(cachedResult, bytesElement, hashElement, parsedTorrent);
}
int? year = parsedTorrent.Response.Year != 0 ? parsedTorrent.Response.Year : null;
var imdbEntry = await Storage.FindImdbMetadata(parsedTorrent.Response.ParsedTitle, torrentType, year);
if (imdbEntry is null)
{
return null;
}
await AddToCache(cacheKey, imdbEntry);
logger.LogInformation("[{ImdbId}] Found best match for {Title}: {BestMatch} with score {Score}", imdbEntry.ImdbId, parsedTorrent.Response.ParsedTitle, imdbEntry.Title, imdbEntry.Score);
return MapToTorrent(imdbEntry, bytesElement, hashElement, parsedTorrent);
}
private IngestedTorrent MapToTorrent(ImdbEntry result, JsonElement bytesElement, JsonElement hashElement, ParseTorrentTitleResponse parsedTorrent) =>
private IngestedTorrent MapToTorrent(ImdbEntry result, long size, string infoHash, ParseTorrentTitleResponse parsedTorrent) =>
new()
{
Source = Source,
Name = result.Title,
Imdb = result.ImdbId,
Size = bytesElement.GetInt64().ToString(),
InfoHash = hashElement.ToString(),
Size = size.ToString(),
InfoHash = infoHash,
Seeders = 0,
Leechers = 0,
Category = AssignCategory(result),
RtnResponse = parsedTorrent.Response.ToJson(),
};
private Task AddToCache(string cacheKey, ImdbEntry best)
{
@@ -163,53 +225,29 @@ public partial class DebridMediaManagerCrawler(
{
AbsoluteExpirationRelativeToNow = TimeSpan.FromDays(1),
};
return cache.SetStringAsync(cacheKey, JsonSerializer.Serialize(best), cacheOptions);
}
private async Task<(bool Success, ImdbEntry? Entry)> CheckIfInCacheAndReturn(string cacheKey)
{
var cachedImdbId = await cache.GetStringAsync(cacheKey);
if (!string.IsNullOrEmpty(cachedImdbId))
{
return (true, JsonSerializer.Deserialize<ImdbEntry>(cachedImdbId));
}
return (false, null);
}
private async Task InsertTorrentsForPage(JsonDocument json)
private async Task<(bool Success, string? Name)> IsAlreadyIngested(string filename)
{
var torrents = await json.RootElement.EnumerateArray()
.ToAsyncEnumerable()
.SelectAwait(async x => await ParseTorrent(x))
.Where(t => t is not null)
.ToListAsync();
var pageIngested = await Storage.PageIngested(filename);
if (torrents.Count == 0)
{
logger.LogWarning("No torrents found in {Source} response", Source);
return;
}
await InsertTorrents(torrents!);
return (pageIngested, filename);
}
private async Task<(bool Success, string? Name)> IsAlreadyIngested(JsonElement entry)
{
var name = entry.GetProperty("path").GetString();
if (string.IsNullOrEmpty(name))
{
return (false, null);
}
var pageIngested = await Storage.PageIngested(name);
return (pageIngested, name);
}
private static string AssignCategory(ImdbEntry entry) =>
entry.Category.ToLower() switch
{
@@ -217,6 +255,22 @@ public partial class DebridMediaManagerCrawler(
var category when string.Equals(category, "tvSeries", StringComparison.OrdinalIgnoreCase) => "tv",
_ => "unknown",
};
private static string GetCacheKey(string category, string title, int year) => $"{category.ToLowerInvariant()}:{year}:{title.ToLowerInvariant()}";
private static ExtractedDMMContent? ParsePageContent(JsonElement item)
{
if (!item.TryGetProperty("filename", out var filenameElement) ||
!item.TryGetProperty("bytes", out var bytesElement) ||
!item.TryGetProperty("hash", out var hashElement))
{
return null;
}
return new(filenameElement.GetString(), bytesElement.GetInt64(), hashElement.GetString());
}
private record DmmContent(string Filename, long Bytes, ParseTorrentTitleResponse? ParseResponse);
private record ExtractedDMMContent(string Filename, long Bytes, string InfoHash);
private record RtnBatchProcessable(string InfoHash, string Filename);
}

View File

@@ -1,9 +0,0 @@
namespace Producer.Features.Crawlers.Dmm;
public class GithubConfiguration
{
private const string Prefix = "GITHUB";
private const string PatVariable = "PAT";
public string? PAT { get; init; } = Prefix.GetOptionalEnvironmentVariableAsString(PatVariable);
}

View File

@@ -0,0 +1,6 @@
namespace Producer.Features.Crawlers.Dmm;
public interface IDMMFileDownloader
{
Task<string> DownloadFileToTempPath(CancellationToken cancellationToken);
}

View File

@@ -0,0 +1,17 @@
namespace Producer.Features.Crawlers.Dmm;
public static class ServiceCollectionExtensions
{
public static IServiceCollection AddDmmSupport(this IServiceCollection services)
{
services.AddHttpClient<IDMMFileDownloader, DMMFileDownloader>(DMMFileDownloader.ClientName, client =>
{
client.BaseAddress = new("https://github.com/debridmediamanager/hashlists/zipball/main/");
client.DefaultRequestHeaders.Add("Accept-Encoding", "gzip");
client.DefaultRequestHeaders.UserAgent.ParseAdd("curl");
client.Timeout = TimeSpan.FromMinutes(10); // 10 minute timeout, #217
});
return services;
}
}

View File

@@ -1,11 +1,10 @@
namespace Producer.Features.Crawlers.EzTv;
public class EzTvCrawler(IHttpClientFactory httpClientFactory, ILogger<EzTvCrawler> logger, IDataStorage storage) : BaseXmlCrawler(httpClientFactory, logger, storage)
public class EzTvCrawler(IHttpClientFactory httpClientFactory, ILogger<EzTvCrawler> logger, IDataStorage storage, ScrapeConfiguration scrapeConfiguration) : BaseXmlCrawler(httpClientFactory, logger, storage)
{
protected override string Url => "https://eztv1.xyz/ezrss.xml";
protected override string Url => scrapeConfiguration.Scrapers.FirstOrDefault(x => x.Name.Equals("SyncEzTvJob", StringComparison.OrdinalIgnoreCase))?.Url ?? string.Empty;
protected override string Source => "EZTV";
private static readonly XNamespace XmlNamespace = "http://xmlns.ezrss.it/0.1/";
private XNamespace XmlNamespace => scrapeConfiguration.Scrapers.FirstOrDefault(x => x.Name.Equals("SyncEzTvJob", StringComparison.OrdinalIgnoreCase))?.XmlNamespace ?? string.Empty;
protected override IReadOnlyDictionary<string, string> Mappings =>
new Dictionary<string, string>

View File

@@ -1,11 +1,10 @@
namespace Producer.Features.Crawlers.Nyaa;
public class NyaaCrawler(IHttpClientFactory httpClientFactory, ILogger<NyaaCrawler> logger, IDataStorage storage) : BaseXmlCrawler(httpClientFactory, logger, storage)
public class NyaaCrawler(IHttpClientFactory httpClientFactory, ILogger<NyaaCrawler> logger, IDataStorage storage, ScrapeConfiguration scrapeConfiguration) : BaseXmlCrawler(httpClientFactory, logger, storage)
{
protected override string Url => "https://nyaa.si/?page=rss&c=1_2&f=0";
protected override string Url => scrapeConfiguration.Scrapers.FirstOrDefault(x => x.Name.Equals("SyncNyaaJob", StringComparison.OrdinalIgnoreCase))?.Url ?? string.Empty;
protected override string Source => "Nyaa";
private static readonly XNamespace XmlNamespace = "https://nyaa.si/xmlns/nyaa";
private XNamespace XmlNamespace => scrapeConfiguration.Scrapers.FirstOrDefault(x => x.Name.Equals("SyncNyaaJob", StringComparison.OrdinalIgnoreCase))?.XmlNamespace ?? string.Empty;
protected override IReadOnlyDictionary<string, string> Mappings =>
new Dictionary<string, string>

View File

@@ -1,13 +1,13 @@
namespace Producer.Features.Crawlers.Tgx;
public partial class TgxCrawler(IHttpClientFactory httpClientFactory, ILogger<TgxCrawler> logger, IDataStorage storage) : BaseXmlCrawler(httpClientFactory, logger, storage)
public partial class TgxCrawler(IHttpClientFactory httpClientFactory, ILogger<TgxCrawler> logger, IDataStorage storage, ScrapeConfiguration scrapeConfiguration) : BaseXmlCrawler(httpClientFactory, logger, storage)
{
[GeneratedRegex(@"Size:\s+(.+?)\s+Added")]
private static partial Regex SizeStringExtractor();
[GeneratedRegex(@"(?i)\b(\d+(\.\d+)?)\s*([KMGT]?B)\b", RegexOptions.None, "en-GB")]
private static partial Regex SizeStringParser();
protected override string Url => "https://tgx.rs/rss";
protected override string Url => scrapeConfiguration.Scrapers.FirstOrDefault(x => x.Name.Equals("SyncTgxJob", StringComparison.OrdinalIgnoreCase))?.Url ?? string.Empty;
protected override string Source => "TorrentGalaxy";
protected override IReadOnlyDictionary<string, string> Mappings

View File

@@ -1,8 +1,8 @@
namespace Producer.Features.Crawlers.Tpb;
public class TpbCrawler(IHttpClientFactory httpClientFactory, ILogger<TpbCrawler> logger, IDataStorage storage) : BaseJsonCrawler(httpClientFactory, logger, storage)
public class TpbCrawler(IHttpClientFactory httpClientFactory, ILogger<TpbCrawler> logger, IDataStorage storage, ScrapeConfiguration scrapeConfiguration) : BaseJsonCrawler(httpClientFactory, logger, storage)
{
protected override string Url => "https://apibay.org/precompiled/data_top100_recent.json";
protected override string Url => scrapeConfiguration.Scrapers.FirstOrDefault(x => x.Name.Equals("SyncTpbJob", StringComparison.OrdinalIgnoreCase))?.Url ?? string.Empty;
protected override string Source => "TPB";

View File

@@ -1,9 +1,8 @@
namespace Producer.Features.Crawlers.Yts;
public class YtsCrawler(IHttpClientFactory httpClientFactory, ILogger<YtsCrawler> logger, IDataStorage storage) : BaseXmlCrawler(httpClientFactory, logger, storage)
public class YtsCrawler(IHttpClientFactory httpClientFactory, ILogger<YtsCrawler> logger, IDataStorage storage, ScrapeConfiguration scrapeConfiguration) : BaseXmlCrawler(httpClientFactory, logger, storage)
{
protected override string Url => "https://yts.am/rss";
protected override string Url => scrapeConfiguration.Scrapers.FirstOrDefault(x => x.Name.Equals("SyncYtsJob", StringComparison.OrdinalIgnoreCase))?.Url ?? string.Empty;
protected override string Source => "YTS";
protected override IReadOnlyDictionary<string, string> Mappings
=> new Dictionary<string, string>

View File

@@ -5,7 +5,6 @@ internal static class ServiceCollectionExtensions
internal static IServiceCollection AddQuartz(this IServiceCollection services, IConfiguration configuration)
{
var scrapeConfiguration = services.LoadConfigurationFromConfig<ScrapeConfiguration>(configuration, ScrapeConfiguration.SectionName);
var githubConfiguration = services.LoadConfigurationFromEnv<GithubConfiguration>();
var rabbitConfiguration = services.LoadConfigurationFromEnv<RabbitMqConfiguration>();
var jobTypes = Assembly.GetAssembly(typeof(BaseJob))
@@ -19,18 +18,13 @@ internal static class ServiceCollectionExtensions
services.AddTransient(type);
}
if (!string.IsNullOrEmpty(githubConfiguration.PAT))
{
services.AddTransient<SyncDmmJob>();
}
var openMethod = typeof(ServiceCollectionExtensions).GetMethod(nameof(AddJobWithTrigger), BindingFlags.NonPublic | BindingFlags.Static | BindingFlags.Instance);
services.AddQuartz(
quartz =>
{
RegisterAutomaticRegistrationJobs(jobTypes, openMethod, quartz, scrapeConfiguration);
RegisterDmmJob(githubConfiguration, quartz, scrapeConfiguration);
RegisterDmmJob(quartz, scrapeConfiguration);
RegisterTorrentioJob(services, quartz, configuration, scrapeConfiguration);
RegisterPublisher(quartz, rabbitConfiguration);
});
@@ -64,13 +58,8 @@ internal static class ServiceCollectionExtensions
}
}
private static void RegisterDmmJob(GithubConfiguration githubConfiguration, IServiceCollectionQuartzConfigurator quartz, ScrapeConfiguration scrapeConfiguration)
{
if (!string.IsNullOrEmpty(githubConfiguration.PAT))
{
AddJobWithTrigger<SyncDmmJob>(quartz, SyncDmmJob.Key, SyncDmmJob.Trigger, scrapeConfiguration);
}
}
private static void RegisterDmmJob(IServiceCollectionQuartzConfigurator quartz, ScrapeConfiguration scrapeConfiguration) =>
AddJobWithTrigger<SyncDmmJob>(quartz, SyncDmmJob.Key, SyncDmmJob.Trigger, scrapeConfiguration);
private static void RegisterTorrentioJob(
IServiceCollection services,

View File

@@ -1,12 +1,12 @@
// Global using directives
global using System.Collections.Concurrent;
global using System.IO.Compression;
global using System.Reflection;
global using System.Text;
global using System.Text.Json;
global using System.Text.RegularExpressions;
global using System.Xml.Linq;
global using FuzzySharp;
global using FuzzySharp.Extractor;
global using FuzzySharp.PreProcess;
global using FuzzySharp.SimilarityRatio.Scorer;
global using FuzzySharp.SimilarityRatio.Scorer.StrategySensitive;

View File

@@ -12,7 +12,8 @@ builder.Services
.RegisterMassTransit()
.AddDataStorage()
.AddCrawlers()
.AddDmmSupport()
.AddQuartz(builder.Configuration);
var app = builder.Build();
app.Run();
app.Run();

View File

@@ -1 +1 @@
rank-torrent-name==0.2.5
rank-torrent-name==0.2.13

View File

@@ -15,7 +15,7 @@ WORKDIR /app
ENV PYTHONUNBUFFERED=1
RUN apk add --update --no-cache python3=~3.11.8-r0 py3-pip && ln -sf python3 /usr/bin/python
RUN apk add --update --no-cache python3=~3.11 py3-pip && ln -sf python3 /usr/bin/python
COPY --from=build /src/out .

View File

@@ -44,6 +44,7 @@ public static class ServiceCollectionExtensions
{
var rabbitConfiguration = services.LoadConfigurationFromEnv<RabbitMqConfiguration>();
var redisConfiguration = services.LoadConfigurationFromEnv<RedisConfiguration>();
var qbitConfiguration = services.LoadConfigurationFromEnv<QbitConfiguration>();
services.AddStackExchangeRedisCache(
option =>
@@ -80,8 +81,8 @@ public static class ServiceCollectionExtensions
e.ConfigureConsumer<WriteQbitMetadataConsumer>(context);
e.ConfigureConsumer<PerformQbitMetadataRequestConsumer>(context);
e.ConfigureSaga<QbitMetadataSagaState>(context);
e.ConcurrentMessageLimit = 5;
e.PrefetchCount = 5;
e.ConcurrentMessageLimit = qbitConfiguration.Concurrency;
e.PrefetchCount = qbitConfiguration.Concurrency;
});
});
});
@@ -98,7 +99,7 @@ public static class ServiceCollectionExtensions
cfg.UseTimeout(
timeout =>
{
timeout.Timeout = TimeSpan.FromMinutes(1);
timeout.Timeout = TimeSpan.FromMinutes(3);
});
})
.RedisRepository(redisConfiguration.ConnectionString, options =>
@@ -110,7 +111,7 @@ public static class ServiceCollectionExtensions
{
var qbitConfiguration = services.LoadConfigurationFromEnv<QbitConfiguration>();
var client = new QBittorrentClient(new(qbitConfiguration.Host));
client.Timeout = TimeSpan.FromSeconds(10);
client.Timeout = TimeSpan.FromSeconds(20);
services.AddSingleton<IQBittorrentClient>(client);
}

View File

@@ -1,6 +1,6 @@
namespace QBitCollector.Features.Qbit;
public class QbitRequestProcessor(IQBittorrentClient client, ITrackersService trackersService, ILogger<QbitRequestProcessor> logger)
public class QbitRequestProcessor(IQBittorrentClient client, ITrackersService trackersService, ILogger<QbitRequestProcessor> logger, QbitConfiguration configuration)
{
public async Task<IReadOnlyList<TorrentContent>?> ProcessAsync(string infoHash, CancellationToken cancellationToken = default)
{
@@ -14,7 +14,7 @@ public class QbitRequestProcessor(IQBittorrentClient client, ITrackersService tr
using var timeoutCts = CancellationTokenSource.CreateLinkedTokenSource(cancellationToken);
timeoutCts.CancelAfter(TimeSpan.FromSeconds(30));
timeoutCts.CancelAfter(TimeSpan.FromSeconds(60));
try
{
@@ -30,7 +30,7 @@ public class QbitRequestProcessor(IQBittorrentClient client, ITrackersService tr
break;
}
await Task.Delay(TimeSpan.FromSeconds(1), timeoutCts.Token);
await Task.Delay(TimeSpan.FromMilliseconds(200), timeoutCts.Token);
}
}
catch (OperationCanceledException) when (timeoutCts.IsCancellationRequested)

View File

@@ -5,7 +5,10 @@ public class QbitConfiguration
private const string Prefix = "QBIT";
private const string HOST_VARIABLE = "HOST";
private const string TRACKERS_URL_VARIABLE = "TRACKERS_URL";
private const string CONCURRENCY_VARIABLE = "CONCURRENCY";
public string? Host { get; init; } = Prefix.GetRequiredEnvironmentVariableAsString(HOST_VARIABLE);
public string? TrackersUrl { get; init; } = Prefix.GetRequiredEnvironmentVariableAsString(TRACKERS_URL_VARIABLE);
public int Concurrency { get; init; } = Prefix.GetEnvironmentVariableAsInt(CONCURRENCY_VARIABLE, 8);
}

View File

@@ -5,6 +5,12 @@ public class WriteQbitMetadataConsumer(IRankTorrentName rankTorrentName, IDataSt
public async Task Consume(ConsumeContext<WriteQbitMetadata> context)
{
var request = context.Message;
if (request.Metadata.Metadata.Count == 0)
{
await context.Publish(new QbitMetadataWritten(request.Metadata, false));
return;
}
var torrentFiles = QbitMetaToTorrentMeta.MapMetadataToFilesCollection(
rankTorrentName, request.Torrent, request.ImdbId, request.Metadata.Metadata, logger);

View File

@@ -1 +1 @@
rank-torrent-name==0.2.5
rank-torrent-name==0.2.13

View File

@@ -8,12 +8,14 @@ public class PostgresConfiguration
private const string PasswordVariable = "PASSWORD";
private const string DatabaseVariable = "DB";
private const string PortVariable = "PORT";
private const string CommandTimeoutVariable = "COMMAND_TIMEOUT_SEC"; // Seconds
private string Host { get; init; } = Prefix.GetRequiredEnvironmentVariableAsString(HostVariable);
private string Username { get; init; } = Prefix.GetRequiredEnvironmentVariableAsString(UsernameVariable);
private string Password { get; init; } = Prefix.GetRequiredEnvironmentVariableAsString(PasswordVariable);
private string Database { get; init; } = Prefix.GetRequiredEnvironmentVariableAsString(DatabaseVariable);
private int PORT { get; init; } = Prefix.GetEnvironmentVariableAsInt(PortVariable, 5432);
private int CommandTimeout { get; init; } = Prefix.GetEnvironmentVariableAsInt(CommandTimeoutVariable, 300);
public string StorageConnectionString => $"Host={Host};Port={PORT};Username={Username};Password={Password};Database={Database};";
public string StorageConnectionString => $"Host={Host};Port={PORT};Username={Username};Password={Password};Database={Database};CommandTimeout={CommandTimeout}";
}

View File

@@ -152,7 +152,8 @@ public class DapperDataStorage(PostgresConfiguration configuration, RabbitMqConf
INSERT INTO files
("infoHash", "fileIndex", title, "size", "imdbId", "imdbSeason", "imdbEpisode", "kitsuId", "kitsuEpisode", "createdAt", "updatedAt")
VALUES
(@InfoHash, @FileIndex, @Title, @Size, @ImdbId, @ImdbSeason, @ImdbEpisode, @KitsuId, @KitsuEpisode, Now(), Now());
(@InfoHash, @FileIndex, @Title, @Size, @ImdbId, @ImdbSeason, @ImdbEpisode, @KitsuId, @KitsuEpisode, Now(), Now())
ON CONFLICT ("infoHash", "fileIndex") DO NOTHING;
""";
await connection.ExecuteAsync(query, files);
@@ -167,7 +168,8 @@ public class DapperDataStorage(PostgresConfiguration configuration, RabbitMqConf
INSERT INTO subtitles
("infoHash", "fileIndex", "fileId", "title")
VALUES
(@InfoHash, @FileIndex, @FileId, @Title);
(@InfoHash, @FileIndex, @FileId, @Title)
ON CONFLICT ("infoHash", "fileIndex") DO NOTHING;
""";
await connection.ExecuteAsync(query, subtitles);

View File

@@ -0,0 +1,19 @@
namespace SharedContracts.Extensions;
public static class DictionaryExtensions
{
public static ConcurrentDictionary<TKey, TValue> ToConcurrentDictionary<TSource, TKey, TValue>(
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector,
Func<TSource, TValue> valueSelector) where TKey : notnull
{
var concurrentDictionary = new ConcurrentDictionary<TKey, TValue>();
foreach (var element in source)
{
concurrentDictionary.TryAdd(keySelector(element), valueSelector(element));
}
return concurrentDictionary;
}
}

View File

@@ -1,5 +1,6 @@
// Global using directives
global using System.Collections.Concurrent;
global using System.Text.Json;
global using System.Text.Json.Serialization;
global using System.Text.RegularExpressions;

View File

@@ -2,5 +2,6 @@ namespace SharedContracts.Python.RTN;
public interface IRankTorrentName
{
ParseTorrentTitleResponse Parse(string title, bool trashGarbage = true);
ParseTorrentTitleResponse Parse(string title, bool trashGarbage = true, bool logErrors = false, bool throwOnErrors = false);
List<ParseTorrentTitleResponse?> BatchParse(IReadOnlyCollection<string> titles, int chunkSize = 500, int workers = 20, bool trashGarbage = true, bool logErrors = false, bool throwOnErrors = false);
}

View File

@@ -12,41 +12,102 @@ public class RankTorrentName : IRankTorrentName
_pythonEngineService = pythonEngineService;
InitModules();
}
public ParseTorrentTitleResponse Parse(string title, bool trashGarbage = true) =>
_pythonEngineService.ExecutePythonOperationWithDefault(
() =>
{
var result = _rtn?.parse(title, trashGarbage);
return ParseResult(result);
}, new ParseTorrentTitleResponse(false, null), nameof(Parse), throwOnErrors: false, logErrors: false);
private static ParseTorrentTitleResponse ParseResult(dynamic result)
public ParseTorrentTitleResponse Parse(string title, bool trashGarbage = true, bool logErrors = false, bool throwOnErrors = false)
{
if (result == null)
try
{
using var gil = Py.GIL();
var result = _rtn?.parse(title, trashGarbage);
return ParseResult(result);
}
catch (Exception ex)
{
if (logErrors)
{
_pythonEngineService.Logger.LogError(ex, "Python Error: {Message} ({OperationName})", ex.Message, nameof(Parse));
}
if (throwOnErrors)
{
throw;
}
return new(false, null);
}
}
public List<ParseTorrentTitleResponse?> BatchParse(IReadOnlyCollection<string> titles, int chunkSize = 500, int workers = 20, bool trashGarbage = true, bool logErrors = false, bool throwOnErrors = false)
{
var responses = new List<ParseTorrentTitleResponse?>();
try
{
if (titles.Count == 0)
{
return responses;
}
using var gil = Py.GIL();
var pythonList = new PyList(titles.Select(x => new PyString(x).As<PyObject>()).ToArray());
PyList results = _rtn?.batch_parse(pythonList, trashGarbage, chunkSize, workers);
if (results == null)
{
return responses;
}
responses.AddRange(results.Select(ParseResult));
}
catch (Exception ex)
{
if (logErrors)
{
_pythonEngineService.Logger.LogError(ex, "Python Error: {Message} ({OperationName})", ex.Message, nameof(Parse));
}
if (throwOnErrors)
{
throw;
}
}
return responses;
}
private static ParseTorrentTitleResponse? ParseResult(dynamic result)
{
try
{
if (result == null)
{
return new(false, null);
}
var json = result.model_dump_json()?.As<string?>();
if (json is null || string.IsNullOrEmpty(json))
{
return new(false, null);
}
var mediaType = result.GetAttr("type")?.As<string>();
if (string.IsNullOrEmpty(mediaType))
{
return new(false, null);
}
var response = JsonSerializer.Deserialize<RtnResponse>(json);
response.IsMovie = mediaType.Equals("movie", StringComparison.OrdinalIgnoreCase);
return new(true, response);
}
catch
{
return new(false, null);
}
var json = result.model_dump_json()?.As<string?>();
if (json is null || string.IsNullOrEmpty(json))
{
return new(false, null);
}
var mediaType = result.GetAttr("type")?.As<string>();
if (string.IsNullOrEmpty(mediaType))
{
return new(false, null);
}
var response = JsonSerializer.Deserialize<RtnResponse>(json);
response.IsMovie = mediaType.Equals("movie", StringComparison.OrdinalIgnoreCase);
return new(true, response);
}
private void InitModules() =>

View File

@@ -15,7 +15,7 @@
<PackageReference Include="Dapper" Version="2.1.35" />
<PackageReference Include="MassTransit.Abstractions" Version="8.2.0" />
<PackageReference Include="MassTransit.RabbitMQ" Version="8.2.0" />
<PackageReference Include="Npgsql" Version="8.0.2" />
<PackageReference Include="Npgsql" Version="8.0.3" />
<PackageReference Include="pythonnet" Version="3.0.3" />
<PackageReference Include="Serilog" Version="3.1.1" />
<PackageReference Include="Serilog.Extensions.Hosting" Version="8.0.0" />

View File

@@ -8,12 +8,14 @@ public class PostgresConfiguration
private const string PasswordVariable = "PASSWORD";
private const string DatabaseVariable = "DB";
private const string PortVariable = "PORT";
private const string CommandTimeoutVariable = "COMMAND_TIMEOUT_SEC"; // Seconds
private string Host { get; init; } = Prefix.GetRequiredEnvironmentVariableAsString(HostVariable);
private string Username { get; init; } = Prefix.GetRequiredEnvironmentVariableAsString(UsernameVariable);
private string Password { get; init; } = Prefix.GetRequiredEnvironmentVariableAsString(PasswordVariable);
private string Database { get; init; } = Prefix.GetRequiredEnvironmentVariableAsString(DatabaseVariable);
private int PORT { get; init; } = Prefix.GetEnvironmentVariableAsInt(PortVariable, 5432);
private int CommandTimeout { get; init; } = Prefix.GetEnvironmentVariableAsInt(CommandTimeoutVariable, 300);
public string StorageConnectionString => $"Host={Host};Port={PORT};Username={Username};Password={Password};Database={Database};";
public string StorageConnectionString => $"Host={Host};Port={PORT};Username={Username};Password={Password};Database={Database};CommandTimeout={CommandTimeout}";
}

View File

@@ -12,7 +12,7 @@
<PackageReference Include="Dapper" Version="2.1.28" />
<PackageReference Include="Microsoft.Extensions.Hosting" Version="8.0.0" />
<PackageReference Include="Microsoft.Extensions.Http" Version="8.0.0" />
<PackageReference Include="Npgsql" Version="8.0.1" />
<PackageReference Include="Npgsql" Version="8.0.3" />
<PackageReference Include="Serilog" Version="3.1.1" />
<PackageReference Include="Serilog.AspNetCore" Version="8.0.1" />
<PackageReference Include="Serilog.Sinks.Console" Version="5.0.1" />