I use Test-Driven Development for my natural language parser, in a very orthodox way. And now that I do it, I see that what I had before is just test-first development, not quite test-driven.
I try to do something new in my parser every day. But very often in the evening I see that I won't manage to get the tests pass today. But, if I insert a little hack here and there, then maybe... And, if mocking really helps, I do it, but not without consequences. I add another test, that doesn't pass because of the mocking. Usually I just take the sentence and add a small variation to it and, of course, to its translation (e.g. I can change a tense, a noun person, the word order). After that I may go to sleep untroubled: the new test belongs to the next day, and I have plenty of time to think how to parse in a more generic way. If this time is not sufficient, I can mock further and create one more test. After a series of those variation tests, I get a satisfactory algorithm and quite a coverage. That's how it goes.
This mocking-all-the-time approach seems to be quite a fruitful way to get the things done. Now I really understand those agile guys saying "keep it simple and never change anything without a test". I never can think of all the possibilities, and there's no right way to do things yet. If I implement a nice beautiful logic based on just one sentence, then the next sentence will inevitably prove this code wrong, or too inflexible, or just plain inconvenient. But when I mock things several times, the mocking logic tends to get more and more generic and finally doesn't look like mocking at all, while still being flexible and convenient. Many mocks have already been transformed into serious logic, many others still wait for it.
By the way, it's not about unit-testing at all. Of 175 tests I have right now only 4 are unit tests. They test the longest common subsequence algorithm that I need for ellipsis and which I had to implement myself. There are also 17 parsing tests which check the internal semantic representation produced after parsing. They are useful to keep an eye on the semantics, but their expected output also tends to change insignificantly quite often. Then it's just tedious to update the tests. So I prefer to write translation tests, because their expectations are a golden standard. I could completely change the architecture of the parser without ever touching them.