diff --git a/Makefile b/Makefile index 09f0686..52a1ac3 100644 --- a/Makefile +++ b/Makefile @@ -17,7 +17,7 @@ feed_cleanup: ## Cleanup RSS feeds @python3 ./scripts/cleanup.py feed_init: ## Initialize feeds from boards.yml - @python3 ./scripts/initialize.py + @python3 ./scripts/initialize.py --config boards.yml --no-upload-favicons -y feed_refresh: ## Refresh RSS feeds @python3 ./scripts/update.py diff --git a/README.md b/README.md index b5a788e..f4cf95d 100644 --- a/README.md +++ b/README.md @@ -1,21 +1,121 @@ -# infomate.club +# 😋 [infomate.club](https://infomate.club) [](https://travis-ci.org/vas3k/infomate.club) -Experimental project +Infomate is a small web service that shows multiple RSS sources on one page and performs tricky parsing and summarizing articles using TextRank algorithm. -### Build and run +It helps to keep track of news from different areas without subscribing to hundreds of media accounts and getting annoying notifications. -```shell script +Thematic and people-based collections does a really good job for discovery of new sources of information. +Since we all are biased, such compilations can really help us to get out of information bubbles. + +Live URL: [infomate.club](https://infomate.club) + + + +## This is a pet-project 🐶 + +Which means you really shouldn't expect much from it. I wrote it over the weekend to solve my own pain. +No state-of-art kubernetes bullshit, no architecture patterns, even no tests at all. +It's here just to show people what a pet-project might look like. + +I wrote this code for fun, not for work. That's usually a huge difference. + + +Like between riding a bike on the streets and cycling in the wild for fun :) + +## How it works + +It's basically a Django web app with a bunch of [scripts](scripts) for RSS parsing. +It stores the parsed data in a PostgreSQL database. + +The web app is only used to show the data (with heavy caching). +Parsing and feed updates are performed by the three scripts running in cron. Like poor people do. + +[Feedparser](https://pythonhosted.org/feedparser/) and [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) are used to find, download and parse RSS. + +Text summarization is done via [newspaper3k](https://newspaper.readthedocs.io/en/latest/) with some additional +protection against bad types of content like podcasts and too big pages in general, which can eat all your memory. Anything can happen in the RSS world :) + +## Running it locally + +The easy way. Install [docker](https://docs.docker.com/install/) on your machine. Then: + +``` +git clone git@github.com:vas3k/infomate.club.git +ls infomate.club docker-compose up --build ``` -### Run +After that navigate to [localhost:8000](http://localhost:8000) -```shell script -docker-compose up -``` - -### Terminate +To terminate: ```shell script docker-compose down --remove-orphans ``` + + +## Running for development + +Make sure you have python3 and postresql installed locally. + +#### Step 1: Install requirements + +``` +pip3 install -r requirements.txt --user +``` + +#### Step 2: Create a database structure + +``` +python3 manage.py migrate +``` + +#### Step 3: Take a look at [boards.yml](boards.yml) + +This is the main source of truth for all RSS streams and collections in the service. +All updates to the database are made through it. For the first time you can just use the existing one. + +#### Step 4: Initialize your feeds + +``` +python3 scripts/initialize.py --config boards.yml +``` + +> Every time you make a change to boards.yml, just run this script again. +> He is smart enough to create the missing ones and remove the old ones. + +#### Step 5: Fetch some articles + +``` +python3 scripts/update.py +``` + +> Don't run it too often, otherwise sites may ban your IP. +> There is a hardcoded cooldown interval for each feed, but you can use `--force` flag to ignore it. + +#### Step 6: Run dev server + +``` +python3 manage.py runserver 8000 +``` + +Then go to [localhost:8000](http://localhost:8000) again + +## boards.yml format + +TBD + +## Contributing + +Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change. + +You can help us with opened issues too. There's always something to work on. + +We don't have any strict rules on formatting, just explain your motivation and the changes you've made to the PR description so that others understand what's going on. + +## License + +[Apache 2.0](LICENSE) © Vasily Zubarev + +> TL;DR: you can modify, distribute and use it commercially, +but you MUST reference the original author or give a link to service \ No newline at end of file diff --git a/boards.yml b/boards.yml index cd4abdc..5fea7a7 100644 --- a/boards.yml +++ b/boards.yml @@ -1,46 +1,18 @@ boards: -- name: Технологии +- name: Tech slug: tech - is_visible: true - is_private: false + is_visible: true # visibility on the main page + is_private: false # logging in is not required to view curator: - name: Технологии - title: Главные новости + name: Tech + title: Main news avatar: https://i.vas3k.ru/fhr.png - bio: Подборка основных изданий о технологиях на русском и английском языках + bio: Major technology media in English and Russian footer: > - это общая подборка популярных технологических СМИ. - Она сделана по итогам опроса в моём телеграм-канале. - Жирным выделяются свежие статьи. - Страница обновляется раз в час. + this is a general selection of popular technology media. + The page is updated once per hour. blocks: - - name: На русском - slug: ru - feeds: - - name: "vc.ru: Технологии" - url: https://vc.ru - rss: https://vc.ru/rss/all - conditions: - - type: in - field: link - in: "https://vc.ru/tech/" - - name: TJ - url: https://tjournal.ru - rss: https://tjournal.ru/rss/all - - name: "Хабр: лучшее за сутки" - url: https://habr.ru - rss: https://habr.com/ru/rss/best/daily/?fl=ru - - name: iXBT - url: https://www.ixbt.com - rss: http://www.ixbt.com/export/news.rss - icon: https://i.vas3k.ru/fkm.jpg - - name: Tproger - url: https://tproger.ru/ - rss: https://tproger.ru/feed/ - - name: OpenNet - url: https://www.opennet.ru/ - rss: https://www.opennet.ru/opennews/opennews_6.rss - - name: На английском + - name: English slug: en feeds: - name: Hacker News @@ -52,6 +24,7 @@ boards: - name: TechCrunch rss: http://feeds.feedburner.com/TechCrunch/ url: https://techcrunch.com + is_parsable: false # do not try to parse pages, show RSS content only - name: Engadget rss: https://www.engadget.com/rss.xml url: https://www.engadget.com @@ -86,7 +59,34 @@ boards: - name: ReadWrite url: https://readwrite.com rss: https://readwrite.com/feed/ - - name: Микс блогов + - name: Russian + slug: ru + feeds: + - name: "vc.ru" + url: https://vc.ru + rss: https://vc.ru/rss/all + is_parsable: false + conditions: + - type: in + field: link + in: "https://vc.ru/tech/" # just an example, no real benefits + - name: TJ + url: https://tjournal.ru + rss: https://tjournal.ru/rss/all + - name: "Habr.com" + url: https://habr.com + rss: https://habr.com/ru/rss/best/daily/?fl=ru + - name: iXBT + url: https://www.ixbt.com + rss: http://www.ixbt.com/export/news.rss + icon: https://i.vas3k.ru/fkm.jpg + - name: Tproger + url: https://tproger.ru/ + rss: https://tproger.ru/feed/ + - name: OpenNet + url: https://www.opennet.ru/ + rss: https://www.opennet.ru/opennews/opennews_6.rss + - name: Mix slug: the_mix feeds: - url: http://www.rssmix.com/ @@ -95,7 +95,7 @@ boards: mix: - http://vas3k.ru/rss/ - http://nedbatchelder.com/blog/rss.xml - - name: Мейнстрим + - name: Mainstream slug: mainstream feeds: - name: "WSJ: Tech" @@ -109,31 +109,28 @@ boards: rss: http://feeds.reuters.com/reuters/technologyNews -- name: Вастрик - slug: vas3k - is_visible: true - is_private: true - curator: - name: Вастрик - url: https://vas3k.ru - title: Айти и путешествия - avatar: https://i.vas3k.ru/eb8.png - bio: Веду блог о технологиях, пишу код, отвратительно путешествую и фотографирую это - footer: > - здесь я собрал сайты, которые составляют 90% того, что я читаю постоянно. - Отбор и фильтрация источников — непрерывный процесс для меня, потому их набор постоянно меняется. - Так что следите. - blocks: - - name: How to Berlin slug: howtoberlin is_visible: true - is_private: true + is_private: false curator: - name: Лена How to Berlin + name: How to Berlin url: https://howtoberlin.de - title: Набор Берлинца + title: Berliner kit avatar: https://i.vas3k.ru/fev.png - bio: Что читать когда переехал в Берлин и не понимаешь что происходит вокруг + bio: What to read when you moved to Berlin and you don't know what's going on around blocks: - + - name: Main and expat news + slug: news + feeds: + - name: "Berlin.de" + url: https://www.berlin.de/aktuelles/ + rss: https://www.berlin.de/en/news/index.rss + icon: https://i.vas3k.ru/fjc.png + - name: "DW.com" + url: https://www.dw.com/en/top-stories/germany/s-1432 + rss: http://rss.dw.com/rdf/rss-en-ger + - name: "TheLocal" + url: https://www.thelocal.de/ + rss: https://feeds.thelocal.com/rss/de + is_parsable: false diff --git a/docker-compose.yml b/docker-compose.yml index 460cd0c..9e3fe08 100644 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -1,7 +1,7 @@ version: '3.7' services: - infomate_app: + infomate_app: &app build: context: . args: @@ -28,3 +28,10 @@ services: - POSTGRES_DB=infomate ports: - 5432 + + migrate_and_init: + <<: *app + container_name: infomate_migrate_and_init + restart: "no" + ports: [] + command: make migrate feed_init diff --git a/templates/layout.html b/templates/layout.html index a0ea209..dd19f04 100644 --- a/templates/layout.html +++ b/templates/layout.html @@ -41,7 +41,7 @@ {% block footer %}