Archive

Author Archive

Reset MySQL root password if you forgot it #mysql

April 12th, 2010 No comments

Just had a need to reset the mysql root login password for a server, did a bit of Googling and found out this is how you do it (I work on Ubuntu so you may have to tinker with the lines slightly depending on your distribution):

  1. Stop the current MySQL instance from running:
    /etc/init.d/mysql stop
  2. Run mysql with –skip-grant-tables
    /usr/bin/mysqld_safe –basedir=/usr –datadir=/var/lib/mysql –user=mysql –pid-file=/var/run/mysqld/mysqld.pid –socket=/var/run/mysqld/mysqld.sock –port=3306 –skip-grant-tables &
  3. Log into mysql, straight into the mysql database
    mysql -u root mysql
  4. Reset the root password.
    UPDATE user SET Password=PASSWORD(‘newrootpassword’) WHERE User=’root’;
  5. Flush privileges
    flush privileges;
  6. Shut down the new running mysql.
    /etc/init.d/mysql stop
  7. Start up mysql as usual.
    /etc/init.d/mysql start

Et voila, you now have a new root password without needing to know the old one!

Categories: Programming, Support Tags:

xbox360 to get USB storage support – finally!

March 27th, 2010 No comments

Starting on April 6th, an upgrade will be pushed out for the xbox360 enabling the use of storage devices  between 1 and 16 gigabytes. So provided you store your save games and profile on there, if you were to get the dreaded RROD, you’d not loose a thing and could continue to play on another xbox!

Categories: Gaming Tags:

Large Hardon Collider breaks energy record

March 21st, 2010 No comments

Yes, it’s finally happened, The Telegraph website made the typo when referring to the LHC:

500x_4445007033_8f5a95dc22

Categories: News, Random Stuff Tags:

Regex-fu #PHPUK2010

February 26th, 2010 No comments

Good start: don’t use it unless you need to, there’s plenty of alternatives, e.g. DOMXML, str_replace, etc. Also PHP5+ has lots of filters for email validation and URL validation etc, function calls you can make rather than complex regular expressions. Regular expressions can slow down quickly due to back tracking, pattern complexity and long strings.

Then the talk has become abstract, each point is prefixed with an odd statement such as “Only elephants remember everything” and “Not all matches are made in heaven” – people are getting it, but everything needs explaining before they get it!

One very good point I have seen ignored many times is “try not to be greedy.” For example /<(.+)>/ in the string <a href=”">fdsfsd</a> will match the entire thing. To make it ungreedy, either use /<(.+?)>/ or /<([^>]+)/ . Greedy matches can be 20+ times slower.

Categories: Programming Tags:

#PHPUK2010 Part 2 (MySQL stuff)

February 26th, 2010 No comments

Just picked up a nice tid-bit on creating a unique index on a two column table where the values in each column may be either way around but you only ever want one instance of the value in that row. So what this means is, inserting 2,1 and 1,2 for example would result in only the first of the two inserts succeeding.

CREATE UNIQUE INDEX ON tablename (LEAST(col1,col2), GREATEST(col1,col2));

Also, WITH, I’ll be honest, never thought about using it to create temporary views. This is a bad example but shows the structure rather well:

WITH tempView (a,b) AS (
SELECT table1.col1, table2.col2
FROM table1
LEFT JOIN table2
ON table1.id=table2.id
)
SELECT a,b FROM tempView;

Better yet is changing this to WITH RECURSIVE tempView and then adding in a select inside the WITH that recalls tempView. The great example he gave is for getting flights from A to B with a varying  amount of stops, it would be possible to get all routes from A to B with one MySQL query, as long as the data stored all connecting routes.

Incidentally, while there is some great stuff coming out of this RDBMS talk, I think the queries are really hurting a lot of people’s heads. Good stuff though.

Categories: Programming Tags:

#PHPUK2010 Part 1

February 26th, 2010 No comments

Josh began by using the dictionary definition of simplicity (as given by Wikipedia) pointing out that the word is often used as a derogatory statement. He then went onto “clarity of expression” and that striving for it while programming is something a lot of people do but never quite seem to achieve.

He spoke of an example where a user comes to a programmer asking for a report, and the usual first reaction is “ah, you need a reporting system.” He also said that’s not always the case, at the end of the day, the user just wanted a report, at this point I heard quite a few people take a breath in through their teeth (particularly the guy sitting to the right of me, he knows who he is.) That is a hard problem, particularly at Stickyeyes where we really do get a lot of people saying “I want a report” and often we have to build a system, simply because of the sheer amount of similar repetitious reports.

He made a very good point about developers having a tendency to go for the newest, shinyest tools (such as HipHop for PHP.) The reason for this is to point out that these tools exist because they solve a particular type of problem, so unless the tool actually helps you, do you really need to use it?

Categories: Programming Tags:

Photo of Ant Holding 500 times its own Bodyweight

February 21st, 2010 No comments

_47340444_008791133-1This amazing picture of an ant holding its own bodyweight while upside down was taken by zoology specialist Dr Thomas Endlein of Cambridge University while researching creatures sticky feet.

This photo snatched the guy £700 in photographic vouchers from the Biotechnology and Biological Sciences Research Council.

His hope is studying the way ants feet self clean and change their size to support varying weights will help develop new adhesives.

“The pads on ants’ feet are self-cleaning and can stick to almost any type of surface,” he said.

“No man-made glue or adhesive system can match this. Understanding how animals can control their adhesive systems should help us come up with clever adhesives in the future.”

Categories: News, Science Tags:

Find Music by Humming – it really works! #midomi

February 1st, 2010 No comments

We are sitting on the sofa at home at the moment discussing our holiday plans for the year and I got a song in my head that I started humming. Remembering the adverts on TV where you hold your phone up to a speaker and it tells you the song, I thought there must be one that you can hum to and it’ll find the song. So I did a quick google and found Midomi, I hummed the song (and I can’t really sing or even hum very well) and sure enough…it found the exact song, 2 results came back and it was the second! We’re both amazed by how well it worked.

So if you get a song stuck in your head and you can hum it, even if you’re quite bad at humming, give this website a go, chances are you’ll be amazed.

Incidentally, the song it found (which was even more impressive) was Kaoma – Lambada, which I’ve included below.

Categories: Random Stuff Tags:

MySQL and Binary(16) – The Reasons/Benefits/Drawbacks (#mysql)

January 31st, 2010 No comments

I recently posted an article about using BINARY(16) for storing MD5′s as unique identifiers instead of simple integer ID’s (usually auto increment); in that article I touched on one of the benefits, reducing JOIN’s, but there are other reasons for doing it too, so I thought I’d post an article discussing purely the reasons behind using BINARY(16).

As I discussed in my previous article, an MD5 string is actually a hexadecimal number capable of storing values as large as 340,282,366,920,938,463,463,374,607,431,768,211,456. MySQL doesn’t have any efficient integer field for storing numbers this big so you have two choices for storage, a CHAR(32) or a BINARY(16). If you convert a hexadecimal MD5 into a unhexed character string, it will become 16 bytes rather than 32. MySQL handily has a feature built in for this called UNHEX.

So, why use binary(16) as a unique field for data storage? Databases like MySQL have superb functionality such as JOIN, allowing you to query one table and “join” the results of that query to another table. However, when you get to 10′s, 100′s or even 1000′s of millions of rows of data, JOIN’s become expensive, especially when the join only exists because you need an ID field from one table to query against on another. From tests at work, replacing a JOIN by using a binary(16) unique identifier has seen noticeable improvements to speed, noticeable here being human noticeable, not iterate it a million times and you’ll see 1.5 as opposed to 1.9 seconds noticeable.

The main benefits include:

  • Fast queries against any table where you know the formula that was used to create the MD5 binary(16) using human-readable English and no integers.
  • Complete disassociation of relational data values
  • Ability to use INSERT IGNORE to avoid duplicate data without having to use overly large indexes
  • More unique values than even a BIGINT.

The main drawbacks include:

  • 12 bytes more storage for the ID (INT is 4 bytes)
  • No auto-incrementation
  • Completely unreadable to humans when the data is in BINARY(16) form.

One thing I just mentioned was disassociation of relational data values. What does this mean exactly? Well it means exactly the same as what people do now with MySQL and unique integer ID’s to be honest! The difference here is you can query against it without those pesky JOIN’s a lot of the time. For example, say you are storing every town in the UK in a database and how they link together (i.e. if there is a direct route from one to another.) You’d have a table named towns probably, with a unique ID and the town name. You’d then have a separate table with 2 columns, both columns would store a town ID which would basically mean “this town has a direct route to this town.” If you were to use integers as the town’s unique ID, every time you wanted to get the town’s linked to said town, you’d have to query against the towns table first to get the town ID you want to get links to, then again to get the names of the towns that link to it.
If you were to use a binary(16) representation of the town you could scrap the first join, instead you could query by saying “get me any towns that link to UNHEX(MD5(‘Town Name’))”. You’d still have to do the second join to get the town names, but you’ve instantly dropped a JOIN and simplified the whole experience as you can now query more naturally.
Basically, all you’re doing is replacing any place in your database that is a string that is usually more than 16 characters in length with a binary(16) of it, then storing the strings elsewhere for when you actually need to read the output. This effectively gives you a look-up table that can contain any string whatsoever and a database that stores relationships of strings without requiring special tables and integers for every string.

As a note, a table with 100 million rows of data with two columns – BINARY(16), TEXT – to look-up the textual value of a binary(16) string takes 0.0019 seconds for us and having that table of text has meant we’ve severely de-duped our database as the data we store often is identical, even when the source is completely different. Even if we do a WHERE BINARY(16) IN (list,of,values), the time sticks at 0.0019 up to the maximum test I’ve done so far which is 100 MD5′s.

Categories: Programming Tags:

MySQL – Binary(16) and scalability

January 29th, 2010 No comments

Over the past few months at work, we’ve seen our database grown from silly big to really silly big, it’s still a way to go to get to the size of the big boys such as Facebook etc. but it’s still a database stored in MySQL that most day-to-day PHP programmers would avoid like a midget cannibal.

One of the great things about using something like MySQL (and any other “real” database) is the ability to cross-query data, i.e. to grab data from one data-set (table) and join it to another data-set (table) to get a single set of results, either as a combination of the data or the result of an exclusion due to the join. *

However, as tables grow, the time taken to perform queries, particularly in the realm of joins, grows rather quickly. So for example take this query:

SELECT *
FROM table2
LEFT JOIN table1
    ON table1.columnB = table2.columnA
WHERE table1.columnC = 'John.Doe';

Let’s say table1 is a list of all employees in a small business and table2 is a list of their days off, so it’s a one-to-many relationship. Running the above query to get the days off for person 5 would be pretty quick and most developers would be happy with that, even if the columns weren’t indexed, the performance of that query (as it’s a small business – therefore small dataset) would be more than suitable for any real-world application.

Now imagine a table where rather than a couple of hundred rows, you have millions or (such as ours) billions of rows of data; as for why we have that much data, that’s for another topic. That join could could result in a rather painful execution time. The problem you’ve got is, you have to first query table1 to get the ID of user ‘John.Doe’ and then use that ID for table2 to get the actual data.

So how can you optimise this? Well you’ve got three choices, the first would be two queries, one to grab the users ID from table1, then the next to grabs the users data from table2; but that’s 2 queries now. In a lot of places that wouldn’t matter, but we want speed here and reduction of hits to MySQL. The second is have the users name in table2 for each day off – that’s duplicating data though and because (in this case) you’d have a string, it’s not the fastest lookup and creates rather large indexes when people’s usernames are quite long.

The third option? A unique hash associated with that user. In this case, MD5 the username and store it as binary(16). MD5 is, after all, a 128-bit number basically. Most people are used to seeing it as a 32 character string, e.g. 7ecb9bba8130abe56cfd9a8430ca969c. That is just a hexadecimal number though, albeit a very very big one – capable of storing the value 340,282,366,920,938,463,463,374,607,431,768,211,456, for those in the UK that’s 340 sextillion. MySQL Doesn’t really have a suitable INT type for storing a number that big so it’s best to either store it as a 32-byte string (hexadecimal MD5) or better yet, as a binary string of 16 characters.

So how does that change our query now?

SELECT *
FROM table2
WHERE table2.columnA = UNHEX(MD5('John.Doe'));

No more join and only one select. It means you can look up days off for any user simply by knowing the username. MySQL has UNHEX(MD5()) to md5 a string and convert to its binary equivalent. In PHP you’d use md5(‘string’, true) or pack(‘H*’, md5(‘string’));

In all honesty, this isn’t the best use of binary(16), but it’s a relatively simple example to follow. For us though, moving away from auto-incrementing ID’s towards binary hashes has allowed use to do blind inserts (insert ignore) and lightning fast selects where they used to take minutes or even hours. INSERT IGNORE has to be one of the biggest benefits we’ve seen. By setting the primary key to the BINARY(16) column, you can easily guarantee unique data without wasted extra index space and you only need to query that table when you actually need to data associated with that unique hash, the rest of the time, you can query other tables that relate to that hash without having to do a join.

* I would like to point out I am fully aware of people who store data without a dedicated database and use Map-Reduce due to the sheer size of it, however databases like MySQL allow a quick line of text to get the results you want, there’s no further effort involved.

Categories: Programming Tags:
Easy AdSense by Unreal