Creating a profanity filter for user generated content
Hi guys!
I'm working on a Java project where i need to create a filter for filtering uploaded names.
The names are uploaded to the db from all over the world, so for-example, i don't want to blacklist a name like "Alassandra" which has "***" in it.
Would appreciate some guidance, help where to look at, or some algorithm tips for creating my type of filter?
Regards
Arvin
Re: Creating a profanity filter for user generated content
The first thing is to decide on is the exact rules you want to enforce.
In the example you gave you clearly won't allow the 3 letter word but will allow it when it's part of another word but what if it is embedded in other characters such as 123xxx456 or xxx.com or *xxx* etc
BTW when writing this I used the actual word rather than xxx and this site allowed the first example but not the other two.
I'd start by searching for something like "banned word filter" "bad word lists" etc and see what is available.
Re: Creating a profanity filter for user generated content
Yes you're probably right. I have looked for an algorithm/solution for a filter, but feeling a little lost. Anybody having tips on algorithms or api? Would really appreciate it !
Re: Creating a profanity filter for user generated content
Edit:
Maybe i should just use regex for filtering?
Re: Creating a profanity filter for user generated content
Quote:
Maybe i should just use regex for filtering
You are confusing implementation details with analysis and design details. How you are going to implement it in code is not relevant at this stage, you need to decide what you are trying to achieve first (Analysis). Then decide on how you will solve the problem (Design) and finally you translate the design to code (Implementation).