slow website

by themba, Monday, February 11, 2019, 23:14 (189 days ago)

I've been running selfoss for a while now. It was rather sluggish and I always assumed because I was a bit lazy during setup and went with sqlite for storage. I recently migrated to mariadb but was surprised to find no noticeable speedup. In particular the ajax requests takes a long time to load the actual items: ?offset=0&itemsPerPage=50&search=&type=newest&tag=&source=&sourcesNav=false&extraIds%5B%5D=220893&ajax=true

I made some sql queries in the database and although I don't know the exact queries made by selfoss getting the news items that would normally be shown on a page with a similar query is nice and fast (I'm only using this myself for some 100 rss feeds so we're not talking about huge databases anyways).

I tried to dabble into the xdebug profiler but to be honest I'm not really experienced with reading the output of that. It seems however that the ajax request spends most of it's time doing a curl request.

Does selfoss attempt to quickly update my sources when I load the page?
That's not the behaviour I would expect. Is there a config option to change that?

Any further insight much appreciated,
Themba

slow website

by jtojnar, Tuesday, February 12, 2019, 02:09 (189 days ago) @ themba
edited by jtojnar, Thursday, August 08, 2019, 13:09

Hmm, I do not think GET / does any HTTP requests and it is pretty fast for me even with hundreds of feeds.

If you do not mind proprietary services, you can try the free variant of https://blackfire.io/docs/introduction, which I find much easier to use.

slow website

by themba, Tuesday, February 12, 2019, 09:16 (189 days ago) @ jtojnar

Thanks for the tip. It'll take some time to try that because I'm currently running on an armv7 VPS. I'll have to migrate my database to my desktop and try the blackfire from there. (there is no armv7 binary for the blackfire agent unfortunately)

slow website

by themba, Saturday, February 16, 2019, 17:31 (185 days ago) @ themba

I have downgraded to echo() profiling because non of the more advanced profilers really work on my LTS ubuntu (php 7.0 :s).

It seems that 11 seconds (99% of the request time) are spent in $tagsDao->getWithUnread();

Possibly the SQL does not get optimized. This is the query in question that takes almost the complete 11 seconds of the request:
SELECT tag, color, COUNT(items.id) AS unread
FROM tags AS tags,
sources AS sources
LEFT OUTER JOIN items AS items
ON (items.source=sources.id AND items.unread=1)
WHERE CONCAT(',', sources.tags, ',') LIKE CONCAT('%,', tags.tag, ',%') COLLATE utf8mb4_general_ci
GROUP BY tags.tag, tags.color
ORDER BY LOWER(tags.tag);


I'm wondering why selfoss uses this "WHERE ... CONCAT() LIKE CONCAT()" construction to join the tags to the sources rather than a tags-sources association table. The latter is a common structure in relational databases and probably gets optimized much better by the database system.

slow website

by jtojnar, Sunday, February 17, 2019, 11:13 (184 days ago) @ themba

You might be right. This is very inefficient and I wanted to rewrite it to (first+) normal form for some time now.

slow website

by themba, Sunday, February 17, 2019, 11:45 (184 days ago) @ jtojnar

Hi,

I did a small experiment, manually rewriting my database to contain a categorytags table that joins the tags to the categories.

However, I found that this does not enable faster queries. Apparently mysql does execute the joins in the optimal order. Joining my 43 sources with the 11 tags I have in my current database is very fast independent on the way the data is linked... That makes sense now.

I already created an index on the "unread" column of the items table some time ago hoping to speed things up. I had, so far, assumed that counting unread items would not be faster/slower depending on the number of unread items (i.e. my thought was "if you have to check them all it's not faster if you count the row once or zero times..."). I had one feed with many unread items that I wasn't following much anymore (y-combinator). It had something like 60000 unread items in it. I marked them all as read and now my queries are much faster again....

Thinking again this makes sense: Filtering on a table column is cheap even if the table is large. Especially if you have an index on the column in question. Joining tables is more expensive but with the filter already executed there are now a lot less items to join. Also the other part of the join (tags and sources) are very small making the join not too bad performance wise (in my case each source has one tag so this join does not increase the table size at all). Probably the most part is spent counting and aggregating the unread counts. For this it needs to loop over the result of the join. This is probably not making use of any kind of index. I'm not sure that there exist index types that can solve this...

Maybe a stats table like suggested in here could be an elegant solution: https://stackoverflow.com/questions/19267507/how-to-optimize-count-performance-on-innodb-by-using-index


You might also check if it helps to loop over tags in php and then count the matching items for each tag in a separate sql statement. That way sql might be able to make better use of the index on source (assuming you have an index on that).

slow website

by niol, Monday, February 18, 2019, 09:59 (183 days ago) @ jtojnar

A long time ago I spent some time optimizing the queries and divided the times by 3. At the same time, I also experimented with using a tag association table ( https://sml.zincube.net/~niol/repositories.git/selfoss/log/?h=tag-assoc-table ) and saw that there was no performance improvement.

Implementing the *WithUnread() functions was part of this optimization. Before, there was a loop in php. Trust me it was really worth it.

Regarding optimizing this, maybe a subquery? Otherwise a cache.

RSS Feed of thread
powered by my little forum