Update readme and installation steps

This commit is contained in:
Vasily Zubarev
2020-02-07 13:56:52 +01:00
parent bde93a4b4c
commit e178f9ff2e
5 changed files with 177 additions and 73 deletions

View File

@@ -17,7 +17,7 @@ feed_cleanup: ## Cleanup RSS feeds
@python3 ./scripts/cleanup.py
feed_init: ## Initialize feeds from boards.yml
@python3 ./scripts/initialize.py
@python3 ./scripts/initialize.py --config boards.yml --no-upload-favicons -y
feed_refresh: ## Refresh RSS feeds
@python3 ./scripts/update.py

120
README.md
View File

@@ -1,21 +1,121 @@
# infomate.club
# 😋 [infomate.club](https://infomate.club) [![Build Status](https://travis-ci.org/vas3k/infomate.club.svg?branch=master)](https://travis-ci.org/vas3k/infomate.club)
Experimental project
Infomate is a small web service that shows multiple RSS sources on one page and performs tricky parsing and summarizing articles using TextRank algorithm.
### Build and run
It helps to keep track of news from different areas without subscribing to hundreds of media accounts and getting annoying notifications.
```shell script
Thematic and people-based collections does a really good job for discovery of new sources of information.
Since we all are biased, such compilations can really help us to get out of information bubbles.
Live URL: [infomate.club](https://infomate.club)
![](https://i.vas3k.ru/i7m.png)
## This is a pet-project 🐶
Which means you really shouldn't expect much from it. I wrote it over the weekend to solve my own pain.
No state-of-art kubernetes bullshit, no architecture patterns, even no tests at all.
It's here just to show people what a pet-project might look like.
I wrote this code for fun, not for work. That's usually a huge difference.
Like between riding a bike on the streets and cycling in the wild for fun :)
## How it works
It's basically a Django web app with a bunch of [scripts](scripts) for RSS parsing.
It stores the parsed data in a PostgreSQL database.
The web app is only used to show the data (with heavy caching).
Parsing and feed updates are performed by the three scripts running in cron. Like poor people do.
[Feedparser](https://pythonhosted.org/feedparser/) and [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) are used to find, download and parse RSS.
Text summarization is done via [newspaper3k](https://newspaper.readthedocs.io/en/latest/) with some additional
protection against bad types of content like podcasts and too big pages in general, which can eat all your memory. Anything can happen in the RSS world :)
## Running it locally
The easy way. Install [docker](https://docs.docker.com/install/) on your machine. Then:
```
git clone git@github.com:vas3k/infomate.club.git
ls infomate.club
docker-compose up --build
```
### Run
After that navigate to [localhost:8000](http://localhost:8000)
```shell script
docker-compose up
```
### Terminate
To terminate:
```shell script
docker-compose down --remove-orphans
```
## Running for development
Make sure you have python3 and postresql installed locally.
#### Step 1: Install requirements
```
pip3 install -r requirements.txt --user
```
#### Step 2: Create a database structure
```
python3 manage.py migrate
```
#### Step 3: Take a look at [boards.yml](boards.yml)
This is the main source of truth for all RSS streams and collections in the service.
All updates to the database are made through it. For the first time you can just use the existing one.
#### Step 4: Initialize your feeds
```
python3 scripts/initialize.py --config boards.yml
```
> Every time you make a change to boards.yml, just run this script again.
> He is smart enough to create the missing ones and remove the old ones.
#### Step 5: Fetch some articles
```
python3 scripts/update.py
```
> Don't run it too often, otherwise sites may ban your IP.
> There is a hardcoded cooldown interval for each feed, but you can use `--force` flag to ignore it.
#### Step 6: Run dev server
```
python3 manage.py runserver 8000
```
Then go to [localhost:8000](http://localhost:8000) again
## boards.yml format
TBD
## Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
You can help us with opened issues too. There's always something to work on.
We don't have any strict rules on formatting, just explain your motivation and the changes you've made to the PR description so that others understand what's going on.
## License
[Apache 2.0](LICENSE) © Vasily Zubarev
> TL;DR: you can modify, distribute and use it commercially,
but you MUST reference the original author or give a link to service

View File

@@ -1,46 +1,18 @@
boards:
- name: Технологии
- name: Tech
slug: tech
is_visible: true
is_private: false
is_visible: true # visibility on the main page
is_private: false # logging in is not required to view
curator:
name: Технологии
title: Главные новости
name: Tech
title: Main news
avatar: https://i.vas3k.ru/fhr.png
bio: Подборка основных изданий о технологиях на русском и английском языках
bio: Major technology media in English and Russian
footer: >
это общая подборка популярных технологических СМИ.
Она сделана по итогам опроса в моём телеграм-канале.
Жирным выделяются свежие статьи.
Страница обновляется раз в час.
this is a general selection of popular technology media.
The page is updated once per hour.
blocks:
- name: На русском
slug: ru
feeds:
- name: "vc.ru: Технологии"
url: https://vc.ru
rss: https://vc.ru/rss/all
conditions:
- type: in
field: link
in: "https://vc.ru/tech/"
- name: TJ
url: https://tjournal.ru
rss: https://tjournal.ru/rss/all
- name: "Хабр: лучшее за сутки"
url: https://habr.ru
rss: https://habr.com/ru/rss/best/daily/?fl=ru
- name: iXBT
url: https://www.ixbt.com
rss: http://www.ixbt.com/export/news.rss
icon: https://i.vas3k.ru/fkm.jpg
- name: Tproger
url: https://tproger.ru/
rss: https://tproger.ru/feed/
- name: OpenNet
url: https://www.opennet.ru/
rss: https://www.opennet.ru/opennews/opennews_6.rss
- name: На английском
- name: English
slug: en
feeds:
- name: Hacker News
@@ -52,6 +24,7 @@ boards:
- name: TechCrunch
rss: http://feeds.feedburner.com/TechCrunch/
url: https://techcrunch.com
is_parsable: false # do not try to parse pages, show RSS content only
- name: Engadget
rss: https://www.engadget.com/rss.xml
url: https://www.engadget.com
@@ -86,7 +59,34 @@ boards:
- name: ReadWrite
url: https://readwrite.com
rss: https://readwrite.com/feed/
- name: Микс блогов
- name: Russian
slug: ru
feeds:
- name: "vc.ru"
url: https://vc.ru
rss: https://vc.ru/rss/all
is_parsable: false
conditions:
- type: in
field: link
in: "https://vc.ru/tech/" # just an example, no real benefits
- name: TJ
url: https://tjournal.ru
rss: https://tjournal.ru/rss/all
- name: "Habr.com"
url: https://habr.com
rss: https://habr.com/ru/rss/best/daily/?fl=ru
- name: iXBT
url: https://www.ixbt.com
rss: http://www.ixbt.com/export/news.rss
icon: https://i.vas3k.ru/fkm.jpg
- name: Tproger
url: https://tproger.ru/
rss: https://tproger.ru/feed/
- name: OpenNet
url: https://www.opennet.ru/
rss: https://www.opennet.ru/opennews/opennews_6.rss
- name: Mix
slug: the_mix
feeds:
- url: http://www.rssmix.com/
@@ -95,7 +95,7 @@ boards:
mix:
- http://vas3k.ru/rss/
- http://nedbatchelder.com/blog/rss.xml
- name: Мейнстрим
- name: Mainstream
slug: mainstream
feeds:
- name: "WSJ: Tech"
@@ -109,31 +109,28 @@ boards:
rss: http://feeds.reuters.com/reuters/technologyNews
- name: Вастрик
slug: vas3k
is_visible: true
is_private: true
curator:
name: Вастрик
url: https://vas3k.ru
title: Айти и путешествия
avatar: https://i.vas3k.ru/eb8.png
bio: Веду блог о технологиях, пишу код, отвратительно путешествую и фотографирую это
footer: >
здесь я собрал сайты, которые составляют 90% того, что я читаю постоянно.
Отбор и фильтрация источников — непрерывный процесс для меня, потому их набор постоянно меняется.
Так что следите.
blocks:
- name: How to Berlin
slug: howtoberlin
is_visible: true
is_private: true
is_private: false
curator:
name: Лена How to Berlin
name: How to Berlin
url: https://howtoberlin.de
title: Набор Берлинца
title: Berliner kit
avatar: https://i.vas3k.ru/fev.png
bio: Что читать когда переехал в Берлин и не понимаешь что происходит вокруг
bio: What to read when you moved to Berlin and you don't know what's going on around
blocks:
- name: Main and expat news
slug: news
feeds:
- name: "Berlin.de"
url: https://www.berlin.de/aktuelles/
rss: https://www.berlin.de/en/news/index.rss
icon: https://i.vas3k.ru/fjc.png
- name: "DW.com"
url: https://www.dw.com/en/top-stories/germany/s-1432
rss: http://rss.dw.com/rdf/rss-en-ger
- name: "TheLocal"
url: https://www.thelocal.de/
rss: https://feeds.thelocal.com/rss/de
is_parsable: false

View File

@@ -1,7 +1,7 @@
version: '3.7'
services:
infomate_app:
infomate_app: &app
build:
context: .
args:
@@ -28,3 +28,10 @@ services:
- POSTGRES_DB=infomate
ports:
- 5432
migrate_and_init:
<<: *app
container_name: infomate_migrate_and_init
restart: "no"
ports: []
command: make migrate feed_init

View File

@@ -41,7 +41,7 @@
{% block footer %}
<div class="footer">
Сделал <a href="https://vas3k.ru">Вастрик</a>.<br><br>
Сделал <a href="https://vas3k.ru">Вастрик</a>. Код проекта <a href="https://github.com/vas3k/infomate.club">открыт</a>.<br><br>
Сайт использует <a href="https://ru.wikipedia.org/wiki/Cookie" target="_blank">куки</a> для авторизации<br> и собирает <a href="{% url "privacy_policy" %}">анонимные данные</a> для статистики.
{% if me %}
<br><a href="{% url "logout" %}" class="button logout-button">Выйти</a>