Monday, April 23, 2012

Improving MySQL tables with Indexes


I am very new to Indexes in MySQL. I know, I should probably have leart it earlier, but most projects been small enough for me to get away with out it ;)



So, now I am testing it. I did my test by running EXPLAIN on a query:



Query:




EXPLAIN SELECT a . *
FROM `tff__keywords2data` AS a
LEFT JOIN `tff__keywords` AS b ON a.keyword_id = b.id
WHERE (
b.keyword = 'dog' || b.keyword = 'black' || b.keyword = 'and' || b.keyword = 'white'
)
GROUP BY a.data_id
HAVING COUNT( a.data_id ) =4



First, without indexes I got these results:



enter image description here



Then, with index on data_id and keyword_id i got this:



enter image description here



So as I understand, the number of rows MySQL has to search goes from 61k down to 10k which must be good right?



So my question is, am I correct here? And is there anything else I could think about when trying to optimize?



UPDATE:



Further more, after some help from AJ and Piskvor pointing out my other table and its column keyword not having index I got this:



enter image description here



Great improvement! Right?


Source: Tips4all

6 comments:

  1. As you see, the key used for table b is still NULL. You may want to add an index on b.keyword and match with

    WHERE b.keyword IN ('dog','black','and','white')


    This is functionally different from your WHERE clause, although it returns the same results.

    As it looks, you may be interested in fulltext searching.

    ReplyDelete
  2. Depending on what you want to achieve, you should either replace the LEFT JOIN with the INNER JOIN or move your WHERE condition into the ON clause:

    As it is now:

    SELECT a.*
    FROM `tff__keywords2data` AS a
    LEFT JOIN
    `tff__keywords` AS b
    ON b.id = a.keyword_id
    WHERE b.keyword = 'dog' || b.keyword = 'black' || b.keyword = 'and' || b.keyword = 'white'
    GROUP BY
    a.data_id
    HAVING COUNT( a.data_id ) = 4


    your query is in fact an INNER join (since you have non-null conditions in the WHERE clause).

    Also, instead of using bit arithmetics (which is not sargable) you should use native OR or IN constructs:

    SELECT a.*
    FROM `tff__keywords2data` AS a
    JOIN `tff__keywords` AS b
    ON b.id = a.keyword_id
    WHERE b.keyword IN ('dog', 'black', 'and', 'white')
    GROUP BY
    a.data_id
    HAVING COUNT(*) = 4


    You may also want to create an index on ttf__keywords (keyword) which can filter on the keywords you are searching for and make less records to be selected from the leading b.

    Finally, if you don't need implicit ordering on a.data_id, get rid of it by appending ORDER BY NULL:

    SELECT a.*
    FROM `tff__keywords2data` AS a
    JOIN `tff__keywords` AS b
    ON b.id = a.keyword_id
    WHERE b.keyword IN ('dog', 'black', 'and', 'white')
    GROUP BY
    a.data_id
    HAVING COUNT(*) = 4
    ORDER BY
    NULL


    This will remove filesort from your plan.

    ReplyDelete
  3. Yep thats improved (but from quickly looking i think can be more improved). what you can see is that the query optimiser is now seeing AND USING keyword_id index. it has reduced the rows its searching from 64283 down to 10216. but this is still using a filesort which hopefully someone else can clarify is similar to a SQL Server table scan? which isn't good... i could be wrong there though.

    You should be able to now reduce the rows from table b down below 10216

    ReplyDelete
  4. You're doing a string comparison to b.keyword....add an index there.

    ReplyDelete
  5. Use an INNER JOIN instead of a LEFT JOIN. A left join will return unmatched rows in the join table which I don't think you need here.

    ReplyDelete
  6. Try putting indexes on everything in a WHERE clause, and anything in a JOIN, so that would be:

    a.keyword_id
    b.id
    b.keyword

    You may also want to try adding an index to a.data_id, as it's in a "GROUP BY". Too many indexes is usually not a problem, unless you're adding large volumes of data to large tables - that can cause INSERTs to be very slow.

    ReplyDelete