David Yip [Tue, 24 Oct 2017 00:31:59 +0000 (19:31 -0500)]
Only cache the regex text, not the regex itself.
It is possible to cache a Regexp object, but I'm not sure what happens
if e.g. that object remains in cache across two different Ruby versions.
Caching a string seems to raise fewer questions.
David Yip [Sun, 22 Oct 2017 05:24:32 +0000 (00:24 -0500)]
Don't add \b to whole-word keywords that don't start with word characters.
Ditto for ending with \b.
Consider muting the phrase "(hot take)". I stipulate it is reasonable
to enter this with the default "match whole word" behavior. Under the
old behavior, this would be encoded as
\b\(hot\ take\)\b
However, if \b is before the first character in the string and the first
character in the string is not a word character, then the match will
fail. Ditto for after. In our example, "(" is not a word character, so
this will not match statuses containing "(hot take)", and that's a very
surprising behavior.
To address this, we only add leading and trailing \b to keywords that
start or end with word characters.
David Yip [Sat, 21 Oct 2017 19:47:17 +0000 (14:47 -0500)]
Move KeywordMute into Glitch namespace.
There are two motivations for this:
1. It looks like we're going to add other features that require
server-side storage (e.g. user notes).
2. Namespacing glitchsoc modifications is a good idea anyway: even if we
do not end up doing (1), if upstream introduces a keyword-mute feature
that also uses a "KeywordMute" model, we can avoid some merge
conflicts this way and work on the more interesting task of
choosing which implementation to use.
David Yip [Mon, 16 Oct 2017 00:49:22 +0000 (19:49 -0500)]
Allow keywords to match either substrings or whole words.
Word-boundary matching only works as intended in English and languages
that use similar word-breaking characters; it doesn't work so well in
(say) Japanese, Chinese, or Thai. It's unacceptable to have a feature
that doesn't work as intended for some languages. (Moreso especially
considering that it's likely that the largest contingent on the Mastodon
bit of the fediverse speaks Japanese.)
There are rules specified in Unicode TR29[1] for word-breaking across
all languages supported by Unicode, but the rules deliberately do not
cover all cases. In fact, TR29 states
For example, reliable detection of word boundaries in languages such
as Thai, Lao, Chinese, or Japanese requires the use of dictionary
lookup, analogous to English hyphenation.
So we aren't going to be able to make word detection work with regexes
within Mastodon (or glitchsoc). However, for a first pass (even if it's
kind of punting) we can allow the user to choose whether they want word
or substring detection and warn about the limitations of this
implementation in, say, docs.
David Yip [Sun, 15 Oct 2017 08:17:33 +0000 (03:17 -0500)]
Set up /settings/keyword_mutes. #164.
This should eventually be accessible via the API and the web frontend,
but I find it easier to set up an editing interface using Rails
templates and the like. We can always take it out if it turns out we
don't need it.
David Yip [Sun, 15 Oct 2017 07:32:03 +0000 (02:32 -0500)]
Use more idiomatic string concatentation. #164.
The intent of the previous concatenation was to minimize object
allocations, which can end up being a slow killer. However, it turns
out that under MRI 2.4.x, the shove-strings-in-an-array-and-join method
is not only arguably more common but (in this particular case) actually
allocates *fewer* objects than the string concatenation.
Or, at least, that's what I gather by running this:
David Yip [Sun, 15 Oct 2017 01:45:14 +0000 (20:45 -0500)]
Make use of the regex attr_reader. #164.
It would also have been valid to get rid of the attr_reader, but I like
being able to reach inside KeywordMute::Matcher without resorting to
instance_variable_get tomfoolery.
David Yip [Sun, 15 Oct 2017 01:36:53 +0000 (20:36 -0500)]
Rework KeywordMute interface to use a matcher object; spec out matcher. #164.
A matcher object that builds a match from KeywordMute data and runs it
over text is, in my view, one of the easier ways to write examples for
this sort of thing.
David Yip [Mon, 9 Oct 2017 22:28:28 +0000 (17:28 -0500)]
Add KeywordMute model.
Gist of the proposed keyword mute implementation:
Keyword mutes are represented server-side as one keyword per record.
For each account, there exists a keyword regex that is generated as one
big alternation of all keywords. This regex is cached (in Redis, I
guess) so we can quickly get it when filtering in FeedManager.
David Yip [Thu, 19 Oct 2017 15:59:50 +0000 (10:59 -0500)]
Make the compose area optionally scrollable.
On desktop, the compose text box grows to accommodate the content. On
mobile, the text box does not grow to accommodate text context, but does
grow to accommodate images. It is possible in both cases to overflow
the available area, which makes accessing other UI elements (e.g.
visibility setttings) difficult.
This commit makes the compose area optionally scrollable, which allows
those UI elements to remain available even if they go off-screen.
David Yip [Wed, 18 Oct 2017 16:52:34 +0000 (11:52 -0500)]
Update stylesheet imports in glitch components.
Commit 6e5471947438fc5883e72b8184663564ffadee28 moved the Mastodon
variables and mixins deeper in the directory hierarchy; this commit
brings the glitch components in line with that change.
Technowix [Wed, 18 Oct 2017 11:51:30 +0000 (13:51 +0200)]
Revert #5438 for FR (#5450)
As said here https://github.com/tootsuite/mastodon/pull/5438 the point of shortening the timestamp is legit, and after some time of adaptation no mistakes can be mades.
aschmitz [Tue, 17 Oct 2017 09:45:06 +0000 (04:45 -0500)]
Clean up reblog tracking keys, related improvements (#5428)
* Clean up reblog-tracking sets from FeedManager
Builds on #5419, with a few minor optimizations and cleanup of sets
after they are no longer needed.
* Update tests, fix multiply-reblogged case
Previously, we would have lost the fact that a given status was
reblogged if the displayed reblog of it was removed, now we don't.
Also added tests to make sure FeedManager#trim cleans up our reblog
tracking keys, fixed up FeedCleanupScheduler to use the right loop,
and fixed the test for it.
Eugen Rochko [Mon, 16 Oct 2017 18:44:31 +0000 (20:44 +0200)]
Keep references to all reblogs of a status on home feed (#5419)
* Keep references to all reblogs of a status on home feed
When inserting reblog: Add to set of reblogs of this status on
the feed, if original status was present in the feed, add it to
that set as well.
When removing a reblog: Remove it from that set. Take random
remaining item from the set. If one exists, re-insert it into feed,
otherwise do not re-insert anything.
Fix #4210
* When original is removed, toss out reblog references
Eugen Rochko [Mon, 16 Oct 2017 14:08:51 +0000 (16:08 +0200)]
Ensure that feed renegeration restores non-zero items (#5409)
Fix #5398
Ordering the home timeline query by account_id meant that the first
100 items belonged to a single account. There was also no reason to
reverse-iterate over the statuses. Assuming the user accesses the
feed halfway-through, it's better to have recent statuses already
available at the top. Therefore working from newer->older is ideal.
If the algorithm ends up filtering all items out during last-mile
filtering, repeat again a page further. The algorithm terminates
when either at least one item has been added, or if the database
query returns nothing (end of data reached)
unarist [Mon, 16 Oct 2017 13:58:23 +0000 (22:58 +0900)]
Fix un-reblogged status being at wrong position in the home timeline (#5418)
We've changed un-reblogging behavior when we implement Snowflake, to insert un-reblogged status at the position reblogging status existed.
However, our API expects home timeline is ordered by status ids, and max_id/since_id filters by zset score. Due to this, un-reblogged status appears as a last item of result set, and timeline expansion may skips many statuses.
So this reverts that change...reblogged status inserted at corresponding position to its id.