Friday, May 4, 2012

Login suggestion query


I want to suggest login to user if their first choice was already taken. Suppose, user want to register as "Superman". There is already some Supermans on site. Logins are suggested in form "Superman01", "Superman02" and so on. So, script must:



  • check for 'Superman' login in db

  • if already used, append '01' to login and check for it in DB

  • if already used, increment counter ('02'), append to login and check again

  • when non-claimed login is found, return it to user



What I don't like in this schema right now is that it takes multiple requests to MySQL database. Is there any way to get first unclaimed login in one go? Maybe with stored procedure or clever SQL query?



UPD : offered a bounty


Source: Tips4all

11 comments:

  1. User regexp to find the required matches:

    SELECT .. FROM users WHERE username REGEXP '^superman[0-9]{1,2}'


    This will return all usernames in form of 'supermanX' or 'supermanXX' (one or two digits).

    After you get your results, you can easily find the next-in-line number or the missing ones.

    For more information, read the following:

    http://dev.mysql.com/doc/refman/5.0/en/pattern-matching.html

    http://dev.mysql.com/doc/refman/5.0/en/regexp.html



    Edit

    Supposing the table is called 'users' and the field in question is called 'username', a possible code snipped is the following:

    /**
    * Checks a given name exists at the users table
    * and returns possible alternatives
    * or an empty string if no alternatives can be found
    */
    function CheckUsername($name);
    // sanitize
    $query = sprintf("SELECT username FROM users
    REGEXP '%s[0-9]{0,2}' ORDER BY username",
    mysql_real_escape_string($name));

    $result = mysql_query($query);

    // get all possible matches
    $rows = array();
    while (list($match) = mysql_fetch_row($result)) {
    $rows[] = $match;
    }

    if (count($rows) == 0) {
    // no rows found, return the original name
    return $name;

    } else {
    // found multiple rows

    if ($rows[0] != $name) {
    // first check if the original name exists
    return $name;

    } else {
    // else go through each number until we find a good username
    $count = 1;
    while ($counter < count($rows) {
    $test = sprintf("%s%02d", $name, $counter);
    if ($rows[$counter] != $test) return $test;
    $counter++;
    }
    }
    }

    // nothing found
    return '';
    }


    I hope it helps.

    ReplyDelete
  2. Why not just select where login like 'superman%' and iterate over the resultset in your code?

    ReplyDelete
  3. Ask for a hint phrase like this:


    Please additionally provide a hint phrase which you would like to be part of your username in case the one you choose is already taken by somebody else.
    For example, if your name is Joseph, then Joseph, josef or joe would be taken already. So you can provide a hint phrase which could be one of:


    your last name - eg. Smithson - which would suggest "joe.smithson"
    your city of residence - eg. Bay Area - which would suggest "joseph_bayarea"
    the purpose of the account - eg. developer - which would suggest "joseph-devel"
    a color - eg. blue - which would suggest "bluejoe"
    a number - which would be suffixed like "joe99"



    Another way to get this hint information would be to combine other data entered in the sign-up form. I cannot immediately think of any other convenient and general-purpose scheme to guess what the user would like suggested as his username.

    Especially since the service your site provides is not specified.

    Another way of approaching this problem is to see the code behind those "intelligent" captchas that sites like Slashdot generate. Some witty devel ;-) has a bunch of words semantically linked to the topic at hand and uses those phrases for a captcha.

    This smart/intelligent captcha thing is a bit like Google Sets.

    Coding Horror also showed these smart catpchas occasionally.

    Play around with those types of services or get hold of a good database of semantically linked terms. Then link those terms with the hint phrase you ask the user to supply.

    Google does this easily because "All your searches are belong to Google"(TM).
    You have a much easier task - you dont have to crawl the web and you dont have to provide search engine results or links. All you need is a semantically database.

    You could get one is you looked hard enough online.
    You could start with synonyms/antonyms etc.
    IIRC, one such is wordnet, but i don't know the license. So do look it up.



    Additional (optional, but dont implement partially):
    I suggest that if you make such a good thing, make it opensource.
    It will be pretty helpful to others and get you a great rep.
    And make sure to also publish code against automated logins for the inevitable situation where some coder with no ethics and lots of spare time will use the semantically linked open word database to generate registration requests against your app and every other !
    Bots keep getting smarter and smarter.
    Email verification is one protection against this - but that's only if that email service cannot be subverted - which it can, if it is a new email service - which keep coming up all the time.

    So it's a sizeable task if you're going to implement this idea and release it as opensource. Then you have to protect it also.

    Or you can just keep it your own site.

    ReplyDelete
  4. Here is my go at this:

    SELECT `login`
    FROM `usertable`
    WHERE `login` LIKE 'Superman%'
    ORDER BY `login` DESC
    LIMIT 1;


    If the query doesn't returns results $username = 'Superman', otherwise:

    $username = 'Superman' . (strrev(intval(strrev($result['username']))) + 1);


    This should do the trick, however I must say I'm not a big fan of your username picking scheme.



    The revised SQL query, in light of klausbyskov's first comment:

    SELECT `login`
    FROM `usertable`
    WHERE `login` RLIKE '^Superman[0-9]*$'
    ORDER BY `login` DESC
    LIMIT 1;

    ReplyDelete
  5. You can, assuming the login field is properly indexed (which it should be), do:

    select login from usertable where login = 'Superman';


    If no rows are returned, you're done. Otherwise, you will have to check for other possibilities:

    select login from usertable where login like 'Superman%' order by login;


    Now, just find the variation with the highest numerical suffix and add one.

    EDIT:
    One query to the db to check only the actual name is fast, but one query to check all possibilities in a big database will be slow (not because of the like match - it's fast if you are indexed - but rather downloading all of those rows and processing them).

    You would be better off doing 1 query to check the name, then only doing the query to check all names when the desired name doesn't work.

    You can also cache the results of that query so that they can be reused without you having to go back to the DB the next time someone picks a supermanesque name. Just be sure to clear the results whenever you add a similar login name to the db.

    ReplyDelete
  6. If you're willing to store some state in the database...

    When someone registers a username, stick it in the "available" table, which has two columns, "base_name" (string) and "next_available" (integer). If someone registers a username which ends in two digits, look for the base (the part preceding the final two digits), and either insert it into "available" or update "next_available".

    When someone enters a username that isn't available, you can just look it up in the "available" table and give out the base and the next_available suffix. This can be done in one query.

    Caveat: If someone registers "superman93" then you only get 6 more usernames, even if numbers 01 to 92 are available.

    ReplyDelete
  7. Here's my mildly daft solution: add one varchar column (called e.g. username_string_part) to your user table to store the string parts of the username, and a second int column (e.g. username_number_part) to store the numeric part. So superman1 is split into "superman" in the username_string_part column and "1" in username_number_part. Also create an index, possibly over both columns or just over the username_string_part if you're not expecting large numbers of duplicate username_string_part entries. So, in MySQL, your create table is something like this):

    CREATE TABLE `users` (
    `id` int(11) NOT NULL auto_increment,
    `username` varchar(25) NOT NULL default '',
    `username_string_part` varchar(25) NOT NULL default '',
    `username_number_part` int(11) NOT NULL default 0,
    PRIMARY KEY (`id`),
    KEY `ix_username_string_part` (`username_string_part`)
    ) TYPE=MyISAM AUTO_INCREMENT=1;


    (Note that the username "superman" has a default username_number_part of zero - this is important.)

    Once you have a few entries, your data would look something like this:

    +----+-----------+----------------------+----------------------+
    | id | username | username_string_part | username_number_part |
    +----+-----------+----------------------+----------------------+
    | 1 | superman | superman | 0 |
    | 2 | superman1 | superman | 1 |
    | 3 | superman3 | superman | 3 |
    +----+-----------+----------------------+----------------------+


    Then it's a case of selecting minimum value of username_number_part that doesn't have a username_number_part value of "itself plus one" in the database. So for the username "superman":

    select min(username_number_part) + 1 as min_number_available from users
    where username_string_part = 'superman' and username_number_part not in
    (select username_number_part - 1 from users where
    username_string_part = 'superman');


    The return value, min_number_available, is NULL if this is the first instance of that username - so they can have it - or an integer for the next free slot otherwise. You then build the recommended username as "superman" + min_number_available. You could do the concat in the query or not as you like. With the example data above you'll get the value "2" returned.

    Downsides: it's going to add storage (column and index), and slow down inserts very slightly. It also doesn't naturally distinguish between "superman001" and "superman01". (Although it could if you treated leading zeroes as part of the username_string_part, so "superman001" would be split as "superman00" and "1".)

    Upsides: it's a single query on indexed columns.

    After all this, I'd be surprised if a site had so many username duplicates that doing a for loop with multiple database queries was really all that bad.

    ReplyDelete
  8. If you can change the database schema the solution is trivial.

    Split the username into two columns: username and username_suffix (INTEGER).

    If the username_suffix is 0, it isn't displayed. i.e. 'superman' and 'superman0' are equivalent.

    You can then simply

    SELECT MAX(username_suffix)+1 WHERE username = 'superman'


    to get the next available suffix.

    Alternatively if you can't change the database schema, try working probabilistically. Append a random 2 digit number; if that collides with an existing user, append a random 3 digit number instead; if that collides ...

    If you don't mind mildly annoying a small fraction of potential users, just suggesting a username which is the user's proposed username with any trailing digits stripped off and extra random digits appended and not checking the database first will probably work well enough:

    eg.

    superman not available, try superman39... (Try 2 extra digits first)
    superman39 not available, try superman491... (now try 1 extra digit each time)
    superman491 not available, try superman8972... (up to (say) 4 digits)
    superman9872 not available, try superman2758


    A potential user would have to be really unlucky to have to retry more than once or twice.



    For some reason I didn't see @Karl's solution before I wrote this. If the extra db column is the best solution, then he should probably get the credit - although I think this is easier. However, the probabilistic approach makes much more sense to me.

    ReplyDelete
  9. As per the comments on the question, a fixed range of 00 - 99 is desired. You could consider to do a SELECT MAX() on the last two parts of the name.

    SELECT max(convert(substring(name, char_length(username)-1, 2), signed)) AS max
    FROM user
    WHERE name LIKE 'superman%'


    This is however not free of maintenance. What if there are 99 supermans?

    This is also not free of potential collisions/clashes with usernames which already ends with digits like 01010101 and h4xx0r1337. What if there are already superman01 and superman02 and a new (and ignorant) user decides to register as superman88 because s/he is born at 1988; any next superman would get superman89 suggested, leaving a hole between superman02 and superman88.

    It's hard to give a "best" answer on this particular question. The safest way would be something like:

    if (find_user($username) != null) {
    for ($i = 0; $user != null; $i++) {
    $username = $username . $i;
    $user = find_user($username);
    }
    }
    // Now suggest $username.


    There is of course a cost, but it's not shocking. Also think again, how often would this occur? Once a day maybe? Or once a year if your forum get on average only 1 new member per day?

    ReplyDelete
  10. The query below uses an auxiliary table with 10 records (digits '0' to '9') and a cross join to create a list of strings '00' to '99'. These values are concatenated with the user chosen login ('superman') and the result tested to be NOT IN your table of current users. The final result is a list of possible login names ('superman00' to 'superman99') which are not currently in use. You could show the user a few of these to choose. I tested in TSQL, should be easy to translate to MySQL (I think you have to replace 'superman'+T.i+U.i with CONCAT('superman',T.i,U.i) ):

    --- prepare a digits table
    create table digits (i char(1));
    insert into digits (i) values ('0')
    insert into digits (i) values ('1')
    insert into digits (i) values ('2')
    insert into digits (i) values ('3')
    insert into digits (i) values ('4')
    insert into digits (i) values ('5')
    insert into digits (i) values ('6')
    insert into digits (i) values ('7')
    insert into digits (i) values ('8')
    insert into digits (i) values ('9')

    --- This query returns all 'superman00' to 'superman99' records currently not used

    SELECT 'superman'+T.i+U.i AS suggestedlogin
    FROM digits T cross join digits U
    WHERE 'superman'+T.i+U.i NOT IN (
    SELECT login FROM usertable
    )


    (Cross join idea from http://www.tek-tips.com/viewthread.cfm?qid=755853)

    ReplyDelete
  11. Most of these answers are correct but hard-code the requested username in the SQL statement.

    SELECT MAX(SUBSTR(user,LENGTH('{$request}')+1))+1
    FROM users
    WHERE username LIKE '{$request}%'


    Will return a suitable suffix (null if the username is not used already)

    C.

    ReplyDelete