... Uhm wait ... You use STM32F4 Discovery, right ?
If so, did you remember to disconnect the on-board SWD programmer ?
-If not, it might interfere with your results.
I tried grabbing the data using my analyzer, and I can't seem to map them properly to the documentation and experience of for instance Mark.
These are the first 400 bits of the transfer:
11111111111111111111111111111111111111111111111111111011110011110011111111111111
11111111111111111111111111111111111111111100101001011001110111000101000000001011
10101000011011000110000000000000000000000000000000100111100101011000100000000000
00000000000000000101001011000110000000000000000000000000000001111001100011011000
10000111100000000000000000000000001111100110000000000000000000000000000000000001
Note: When using HLA_SWD on STM32F4, I get 53 leading ones and 53 trailing ones, but when I use JTAG-lock-pick Tiny 2 with LPC1751, I get 50 leading ones and 50 trailing ones.
What looks odd to me, is that it looks like there is no turnaround!!
-But also according to the documentation found here, there should be turnaround: http://www.arm.com/files/pdf/Low_Pin-Count_Debug_Interfaces_for_Multi-device_Systems.pdf
I get IDCODE 0x2BA01477 for both STM32F4 (which is a Cortex-M4 device) and LPC1751 (which is a Cortex-M3 device).
Here's a similar session with JTAG-lock-pick Tiny 2 and LPC1751:
11111111111111111111111111111111111111111111111111011110011110011111111111111111
11111111111111111111111111111111111100101001011001110111000101000000001011101010
00111000000110011011110000000000000000000000000000000000001011000110010000010000
00000000000000000111101110010101100110000010000000000000000000000000011011000110
00000001000000000000000000000000011110010101100110000000000000000000000000000101
In both cases, it seems there's no turnaround clock. So I believe that right after sending the last bit, before changing the SWCLK to LOW, you could try and change the SWDIO direction to input, then issue the SWCLK_LO and have 0 turn-around clock cycles.
According to all the documents I found, it seems that data are sampled on the rising edge of SWCLK, thus changing the direction should be done right after you've sampled the data.
So far, I have been unable to do a test with the JTAG-lock-pick Tiny 2 connected to the Discovery board, but if I get it set up, I'll post the output here as well.