Project | MSX2(+) video to VGA conversion (proof of concept)

« Back to project details Sort by:

Color bus hack
04/11/2021 at 03:05 • 0 comments
One very interesting feature of V99X8 VDPs is the "color bus". These 8 pins usually carry the color (or color index) of the pixel being drawn, but can be also used as inputs for external video signals. These modes are described on pg. 109 of the "technical data book".

I neglected to look deeper at the color bus, but fellow hackaday user tomcircuit gave me a great idea how to use it. I already had the whole software + hardware + test rig 95% ready, here are the changes I did to use it.

1. Atrocious hardware hack

This is something that should never be done, but in this case it was the quick and lazy way - I soldered 4 wires directly to bits 3...0 of the color bus to tap into those signals (pins 16, 17, 18, 19).

this creates a 4-bit digital pixel signal. The original project had 3 digital lines (R, G, B) so I had to add 1.
```
VDP_I_DIG <= PMOD(4);    -- INPUT!    -- Bit3 from color bus
```
2. Extending the FPGA pixel width from 3 to 4 bits

The "DLCLK" signal is not used in this project, instead I recreated it in the FPGA using CPUCLK, and this internal clock can be tweaked using a delay line configurable by switches on the FPGA board. This allows timing "fine tuning":
```
i_delayed <= i_line(to_integer(unsigned(switch(7 downto 6) & '1'))); -- use "red" switches
r_delayed <= r_line(to_integer(unsigned(switch(7 downto 6) & '1')));
g_delayed <= g_line(to_integer(unsigned(switch(5 downto 4) & '1')));
b_delayed <= b_line(to_integer(unsigned(switch(3 downto 2) & '1')));
```
The new "i" line has to be brought to the sampler to be captured. Luckily the MSB of the "color nibble" was free.

Mode Dual port RAM byte structure Notes
RGB 0RGB0RGB MSB is hard coded to 0
Color bus c3c2c1c0c3c2c1c0 c3 = "i" signal
c2 = pin 17 drives "R" input
c1 = pin 18 drives "G" input
c0 = pin 19 drives "B" input

The net result is very clean 2 16-color pixels per byte in FPGA dual port video RAM:
```
on_sample_pulse: process(sample_pulse, i, r, g, b, sample)
begin
if (rising_edge(sample_pulse)) then
sample <= sample(3 downto 0) & i & r & g & b;
end if;
end process;
```
3. Color palette update

With 3 bits per pixel directly mapped to R, G, B there is not much to be done in terms of color palette: 000 will logically map to "black" and 111 to "white" etc.

With 4 bits (or more, up to 8), the color bus can be interpreted to carry the "index" and an external memory (for example 256 * 24 bits) can define the exact color meaning of each index. This is of course easy to do in FPGA so here the mapping I implemented:
```
-- standard TMS9918 16-color palette (http://www.cs.columbia.edu/~sedwards/papers/TMS9918.pdf page 26) 
signal video_color: color_lookup := (
    color_transparent,    -- VGA does not support is, so "black"
    color_black,
    color_medgreen,    
    color_ltgreen,
    
    color_dkblue,
    color_ltblue,    
    color_dkred,    
    color_cyan,    

    color_medred,
    color_ltred,
    color_dkyellow,
    color_ltyellow,

    color_dkgreen,
    color_magenta,
    color_gray,
    color_white
    );
```
With the palette defined above, the VDP color can be described as "any 16 colors out of 256", that's because the width of the palette register is 8 bits, defined as:

RRRGGGBB
Here is the definition of the colors used in the palette:
```
constant color_transparent:				std_logic_vector(7 downto 0):= "00000000";
constant color_medgreen: 					std_logic_vector(7 downto 0):= "00010000";
constant color_dkgreen:						std_logic_vector(7 downto 0):= "00001000";
constant color_dkblue:						std_logic_vector(7 downto 0):= "00000010";
constant color_medred:						std_logic_vector(7 downto 0):= "01100000";
constant color_dkred:						std_logic_vector(7 downto 0):= "01000000";
constant color_ltcyan:						std_logic_vector(7 downto 0):= "00001110";
constant color_dkyellow:					std_logic_vector(7 downto 0):= "10010000";
constant color_magenta:						std_logic_vector(7 downto 0):= "01100010";

constant color_black:			std_logic_vector(7 downto 0):= "00000000";
constant color_blue,	color_ltblue:	std_logic_vector(7 downto 0):= "00000011";
constant color_green,	color_ltgreen:	std_logic_vector(7 downto 0):= "00011100";
constant color_cyan:			std_logic_vector(7 downto 0):= "00011111";
constant color_red,	color_ltred:	std_logic_vector(7 downto 0):= "11100000";
constant color_purple:			std_logic_vector(7 downto 0):= "11100011";
constant color_yellow,	color_ltyellow: std_logic_vector(7 downto 0):= "11111100";
constant color_white:			std_logic_vector(7 downto 0):= "11111111";
constant color_ltgray:			std_logic_vector(7 downto 0):= "01101110"; 
constant color_dkgray,  color_gray:	std_logic_vector(7 downto 0):= "10010010";
```
With the modified Propeller test code (see below) this gives following colors (yellowish small bars on the bottom is my zombie sprite bug in Propeller code :-) ):

Note that "color 0" ("transparent") magically really works - the VDP simply decides to let the background color come through (first "dark blue") vertical bar in the VDP display window.

4. Test code update

Just a minimal change was needed, to see the 16 colors in action:
```
PRI _colorfulBlocks(color) |x, y, c
  c := 0
  repeat x from 0 to vdp.GraphicsHPixelCount - 1
    repeat y from 0 to vdp.GraphicsVPixelCount - 1
      if (color == vdp#TRANSPARENT)
        vdp.DrawPixel(x, y, x ^ y)
      else
        'vdp.DrawPixel(x, y, ColorPalette8[x & 7])
        vdp.DrawPixel(x, y, x & 15)
        c++
  vdp.WaitASecond
```
The x coordinate (which goes from 0 to 63) is used to set the color 0 ... 15.

No other code changes were done. But the results are much better than with the primitive 1-bit RGB A/D converter:
- 16 colors instead of 8
- no color bleeding or wrongly sampled pics
Here are some examples of demo screens using the color bus (lame pics of the screen, the actual quality is much better):

I have bugs with sprite patterns, but it can be observed that in case of scroll the VGA output can display the "picture in flux" - remember that sampler runs at VDP sync, and VGA at its own sync and they are completely async to each other.

5. Conclusion

Sampling the analog RBG outputs of V99X8 VDPs is possible and can lead to acceptable VGA picture, it requires higher quality A/D converters, PCBs and connections.

Sampling the color bus on the other hand leads to high-quality VGA sampling even with most basic hardware, essentially just direct wiring from VDP to FPGA.

I leave to some hardware wizard to create V99X8-based VGA board. In its simplest form, such board could contain only:
- V99X8
- FPGA (depending on the resources can contain VDP "dynamic" RAM, VGA "dual port" RAM, and the sampler / VGA controller described here)
- D/A VGA output circuit and connector
The board could be made to accept various "adapters" such as for RC2014, rosco_m68k, or even directly TMS9918 socket pinout (VGA for TI-99/4A!)
Future improvements
03/29/2021 at 04:41 • 0 comments
From the images and demo videos, it is obvious that the video quality is barely acceptable. There are two main problems:
- image sharpness - there is cross-bleeding of colors, noise artifacts etc.
- color resolution - only 8 basic colors are supported
Solutions for image sharpness

The flash A/D as I prototyped is very much a "chewing gum/duct-tape" solution, that can be improved in many ways:
- Put the circuit on a permanent solder board
- Keep wiring trimmed and matched
- Use higher quality potentiometers that allow finer and more stable regulation of threshold voltage
- Introduce external 21.47727MHz crystal to drive the sampler circuit instead of multiplying CPUCLK (which is XTAL/6) by 6 on FPGA
Solutions for color resolution

With 1-bit flash A/D per color channel only following colors can be supported:

RGB color
000 BLACK
001 DARK BLUE
010 DARK GREEN
011 CYAN
100 DARK RED
101 MAGENTA
110 DARK YELLOW
111 WHITE

For a small improvement of resolution, for example from 1 to 2 bits, additional LM339 comparator per color channel could be used. However using 6 LM339s instead of 3 would not double the color resolution. Reason is that 2 LM339 set at 1/3 and 2/3 thresholds would produce 3 valid combinations:

00 no color
01 color intensity low
10 (ignore, as should not occur: if the higher LM339 is over the threshold, lower must be too)
11 color intensity high

Still, 6-bit color digital vector obtained like this could be simply mapped at least to a valid 16-color table.

One additional interesting experiment would be to use the popular LM3914 dot-bar driver chip as a flash A/D. Theoretically, full 3-bit A/D conversion could be obtained from its 10 stage outputs.

Video conversion using dual port RAM in FPGA

03/29/2021 at 04:06 • 0 comments

The basic approach is essentially the same as described here:

https://hackaday.io/project/176081-tim-011-fpga-based-vga-and-ps2-keyboard-adapter/log/186524-converting-tim-011-video-to-vga

The key differences are:

	TIM-011	V99X8
Resolution	512*256	256*192 (typically)
Colors	4 (2 bit "intensity")	8 (1 bit per R, G, B)
Pixels per byte	4 b7:b0 = VvVvVvVv	2 b7:b0 = -RGB-RGB
Pixel clock	12MHz	5.3693175
Data sampler clock	48MHz	21.47727MHz
Horizontal sync	positive HSYNC, video signal has no porches	positive HSYNC, video signal has front and back porch
Vertical sync	positive VSYNC, video signal has no porches	regenerated from CSYNC, video signal has top and bottom porch
Window on VGA	512*256	512*384
Memory used	32k	24k

Refer to following files for key components:

Sys_TIM011_Mercury

This is the main top-level component. The video signals come in through 8-pin PMOD port:

alias VIDEO_HSYNC: std_logic is PMOD(7); -- BB6 on Anvyl (white)
alias VIDEO_CSYNC: std_logic is PMOD(6); -- BB5 on Anvyl (blue)
alias VDP_B_DIG: std_logic is PMOD(3);     -- "digitized" blue signal (using LM339 1-bit ADC)
alias VDP_G_DIG: std_logic is PMOD(2);     -- "digitized" green signal (using LM339 1-bit ADC)
alias VDP_R_DIG: std_logic is PMOD(1);     -- "digitized" red signal (using LM339 1-bit ADC)
alias VDP_CPUCLK: std_logic is PMOD(0);     -- v9958 pin 8 (XTAL/6 == 3.579545MHz)

(simplified here, the actual code contains overlapped signals for TIM-011 mode)

Out of these signals only VIDEO_HSYNC is directly used, as is a positive pulse that resets the horizontal scan counter and drives the vertical scan.

VIDEO_CSYNC:

Contains the VSYNC but also the HSYNC signals. To extract the VSNYC only a simple delay line is used that filters out a signal which is less than the length of HSYNC (24 pixels = 96 XTALs)

--generate VSYNC by filtering out HSYNC from CSYNC using a delay line
on_vdp_cpuclk: process(reset, VDP_CPUCLK, VIDEO_CSYNC, VIDEO_HSYNC)
begin
    if (rising_edge(VDP_CPUCLK)) then
        csync_line <= csync_line(30 downto 0) & VIDEO_CSYNC; 
    end if;
end process;

vdp_vsync <= not (VIDEO_CSYNC or csync_line(17)); -- 24 pixels long ~ 17 CPUCLK

VDP_CPUCLK:

This the master used for sync of pixel clock. The frequency is XTAL/6. So to get XTAL, we multiply by 12 (using a built-in DCM "digital clock manager" circuit baked into the Xilinx FPGA. Almost all FPGAs support similar (or PLL) circuits to generate clocks of almost any frequency). However multiplying with 12 is not perfect, it is noticeable in vertical bars that appear when digitizing the R, G, B signals.

The clock produced (42.95454 MHz) is then divided by 2 but also used to drive delay lines for digitized R, G, B:

on_vdp_xtal_int2: process(VIDEO_HSYNC, vdp_xtal_int2, VDP_R_DIG, VDP_G_DIG, VDP_B_DIG, r_line, g_line, b_line)
begin
--	if (VIDEO_HSYNC = '1') then
--		vdp_xtal_int <= '0';
--	else
		if (rising_edge(vdp_xtal_int2)) then
			vdp_xtal_int <= not vdp_xtal_int;
			r_line <= r_line(6 downto 0) & VDP_R_DIG;
			g_line <= g_line(6 downto 0) & VDP_G_DIG;
			b_line <= b_line(6 downto 0) & VDP_B_DIG;
		end if;
--	end if;
end process;

VDP_R_DIG, VDP_G_DIG, VDP_B_DIG:

These are the "raw" 1-bit color signals from LM339. But they are not directly fed to the sampler, a bit of timing tweak is possible by tapping into the delay line. This allows removing some noise to sample the video signals at a precise moment.

r_delayed <= r_line(to_integer(unsigned(switch(7 downto 6) & '1')));
g_delayed <= g_line(to_integer(unsigned(switch(5 downto 4) & '1')));
b_delayed <= b_line(to_integer(unsigned(switch(3 downto 2) & '1')));

Six switches on the Mercury baseboard select the moment to sample the color signal.

With these signals ready, they are fed into the "sampler" component:

offset_vdp <= button(3 downto 0) when (switch_tms = '1') else "0000";
vdp: vdp_sampler2 port map (
		reset => RESET,
		clk => vdp_xtal_int, -- 
		hsync => VIDEO_HSYNC,
		vsync => vdp_vsync,
		pixclk => vdp_pixclk,
		offsetclk => freq4, 
		offsetcmd => offset_vdp, -- in TMS mode move the 0, 0 dot within the window
		r => r_delayed, --VDP_R_DIG,
		g => g_delayed, --VDP_G_DIG,
		b => b_delayed, --VDP_B_DIG,
		a => vdp_sampler_a,
		d => vdp_vram_dina,
		limit => "001110", --switch_limit, 
		we_in => we_in,
		we_out => vdp_sampler_wr_nrd
	);

The sampler takes following inputs:

video signals from V9958 conditioned as described above
"offset" which is a command to increment / decrement internal register that determines when the pixel signals start
"limit" is a constant that determines the timing when to take sample of pixel and write to dual-RAM

Outputs:

a - address to dual-RAM
d - data to be written to dual-RAM
we_out - write enable to dual-RAM

vdp_sample2.vhd

The "sampler" circuit is relatively simple. The key to remember is:

4 XTAL = 1 pixel ("sample_pulse")

2 pixel = 1 byte ("write_pulse")

8 XTAL = 1 byte

So in 8 input clock cycles, the R, G, B signals have to be sampled twice and byte containing the xRGBxRGB written once:

-- 8 xtal cycles == 2 pixel clock == 1 byte
on_clk: process(clk, hsync, cnt, r, g, b)
begin
	if (hsync = '1') then
		cnt <= "000";
	else
		if (falling_edge(clk)) then
			cnt <= std_logic_vector(unsigned(cnt) + 1);
		end if;
	end if;
end process;

pixclk <= cnt(1);
write_pulse <= (limit(5) xor clk) when (cnt = limit(2 downto 0)) else '0';
sample_pulse <= (limit(5) xor clk) when (cnt(1 downto 0) = limit(4 downto 3)) else '0';

The exact timing when this happens in 8 cycle sequence is determined by parameter "limit" set as constant from outside (it is somewhat tweakable).

The "sample" pulse drives a shift register that moves by 4 bits (note that MSB is set as '0'), and lower 3 bits capture the RGB color:

on_sample_pulse: process(sample_pulse, r, g, b, sample)
begin
	if (rising_edge(sample_pulse)) then
		sample <= sample(3 downto 0) & '0' & r & g & b;
	end if;
end process;

How is the sampled color byte (containing 2 pixels) stored in the memory?

The scan line is typically 256 pixels, which means 128 bytes, 7 bits. And then there are 192 rows which fits in 8 bits. So the 14-bit address is:

VVVVVVVVHHHHHHH

-- output signals
d <= sample;

a <= v_off(7 downto 0) & h_off(7 downto 1);
we_out <= write_pulse and (not h_off(8)) and (not v_off(8));

-- offset to ignore "left" before real pixel data comes in
h_off <= std_logic_vector(unsigned(h) + unsigned(h_offset(8 downto 0)));--unsigned(limit(2 downto 0) & "00"));
-- offset to ignore "top" before real pixel data comes in
v_off <= std_logic_vector(unsigned(v) + unsigned(v_offset(8 downto 0)));--unsigned(limit(5 downto 3) & "00"));
v_ok <= '0' when (unsigned(v_off) > 191) else '1';

However, the V and H are not direct horizonatal or vertical counters. The pixels do not start right after VSYNC and HSYNC signals, there are "porches" that delay the start. So both directions have offsets that can be tweaked using 2 up/down counter registers:

h_reg: offsetreg Port map ( 
				reset => reset,
				initval => "1111100110", -- -26 (0x3E6)
				mode => offsetcmd(1 downto 0),
				clk => offsetclk,
				sel => '0',
				outval => h_offset
			);

v_reg: offsetreg Port map ( 
				reset => reset,
				initval => "1111100101", -- -27 (0x3E5)
				mode => offsetcmd(3 downto 2),
				clk => offsetclk,
				sel => '0',
				outval => v_offset
			);

Driving V9958 using Propeller

03/29/2021 at 04:05 • 0 comments

The Propeller spin code used to drive the design for test purposes has been written years ago, for a different project:

However, it could be repurposed here with only minimal changes. That was possible because:

V99X8 VDPs are truly backward compatible with TMS9918
No special 99X8 modes are being used
No extended registers are being used (only single address line is used)

Parallax Propeller is a very powerful chip - it contains 8 32-bit CPUs that can control 32-bit I/O pins. This allows direct interfacing with legacy chips in speed ranges below 10MHz or so. Beside VDPs, for example I was able to drive a Am9511 FPU too.

This project has only 2 files:

TMS9918.spin

This is the VDP driver. It is interfacing the physical pins and drives them as if the VDP is on a bus of a microcomputer.

CON
'Signal     Propeller pin   VDP pin ( == F18A pins)
nRESET =    27'12'             34 == pull low for reset
MODE =      26'11'             13 == memory/register mode
nCSW =      25'10'             14 == write to register or VDP memory
nCSR =      24'9'      '       15 == read from register or VDP memory
nINT =      23'8'              16 == input always, activated after each scan line if enabled
CD0 =       7'              24 == MSB (to keep with "reverse" TMS99XX family documentation)
CD1 =       6'              23
CD2 =       5'              22
CD3 =       4'              21
CD4 =       3'              20
CD5 =       2'              19
CD6 =       1'              18
CD7 =       0'              17 == LSB
'VSS                        12 == GND
'VCC                        33 == +5V

Programming the Propeller has many interesting aspects, one of the most important ones is how to make multiple CPUs ("cogs") work in parallel. Each cog can drive own pins, but when the cog is stopped, those pins are "released". To ensure the pins toward VDP are constantly driven, a cog is initialized and then kept in a "dead loop".

The public "Start" method communicates the shared memory (described later) and after some housekeeping kicks off the _vdpProcess() routine in a new cog.

PUB Start(plCommandBuffer, initialMode, useInterrupt, enableTracing) : success

  longfill(@stack, 0, STACK_LEN)
  skipTrace := true
  if (enableTracing)
    pst.Start(115_200)
    pst.Clear
    skipTrace := false

  Stop

  plCommand := plCommandBuffer
  longfill(@spriteSpeed, 0, 32)
  colorGraphicsForeAndBack := byte[@GoodContrastColorsTable]

  _prompt(String("Press any key to continue with TMS9918 object start using command buffer at "), plCommand)

  lockCommandBuffer := locknew
  if (lockCommandBuffer == -1)
    _logError(String("No locks available to start object!"))
    return false
  else
    cogCurrent := cognew(_vdpProcess(initialMode, useInterrupt), @stack)
    if (cogCurrent == -1)
      _logError(String("No cogs available to start object!"))
      lockret(lockCommandBuffer~)
      return false
  waitcnt((clkfreq * 1) + cnt)
  _logTrace(String("TMS9918 object launched into cog "), cogCurrent, String(" using lock "), lockCommandBuffer, String(" at clkfreq "), clkfreq, 0)
  return true

The cog now runs the routine until it exists or other cog kills it from outside. The _vdpProcess() does the following:

initialized the pins (input / output)
fills the video memory (clears 16k)
sets initial video mode

After that, it goes into an infinite loop of watching for a command and its parameters, and if received executes them. This is very similar to Window message processing paradigm: as long as the window exists, it has a "message pump" that accepts commands sent to it and execute them (one can even say that cog is the "hWnd").

The commands are "longs" (32-bit) values written to common RAM memory area. This is again similar to Windows CMD, lParam and wParam mechanism, but to simplify, the number of parameters here are flexible based on the command:

PRI _vdpProcess(initialMode, useInterrupt) |i, y, timer
  _logTrace(String("TMS9918 object starting in cog "), cogId, String(" using lock "), lockCommandBuffer, String(" at clkfreq "), clkfreq, 0)

  nextCharRow := 0
  nextCharCol := 0
  if (useInterrupt)
    vdpAccessWindow := ((((clkfreq / 60) * (262 - 192)) / 262) * 95) / 100 'see table 3.3 in TMS9918 documentation (we have 70 scan lines every 1/60s)
  else
    vdpAccessWindow := clkfreq / 60
  _logTrace(String("Initial mode is "), initialMode, String(" use interrupt is "), useInterrupt, String(" vdp access clock cycles is "), vdpAccessWindow, 0)

  outa[nReset .. CD7]~~         'set all to 1 (inactive)
  dira[nReset .. CD7]~          'set all to input first
  dira[nReset .. nCSR]~~        'these are always outputs
  _vdpReset
  _setReg(1, reg[1] & %1011_1111) 'blank screen
  lastStatus := _readStatus
  _fillVdpMem(0, 16 * 1024, 170, 0) '10101010 pattern
  'this is the first command that will be executed
  long[plCommand][0] := CMD_SETMODE
  long[plCommand][1] := initialMode
  displayMode := initialMode
  longfill(@lastSpritePositionUpdateCnt, cnt, 32)
  repeat  'keep executing commands until cog is stopped
    repeat until not lockset(lockCommandBuffer) 'wait for the free lock (don't execute while command buffer is updated)

    'update position of even numbered sprites according to their speed, if set
    _updateSpritePositions(0)

    timer := cnt
    case LONG[plCommand]
      CMD_SETSPRITEMODE:
        _setSpriteMode(long[plCommand][1] & %0000_0011)
        '_logCommand(String("CMD_SETSPRITEMODE in mode "), _interval(cnt, timer))

... (OTHER COMMANDS)

This mechanism could allow:

FIFO buffering of commands (the driver component can work async and "stuff" commands to some preset depth and continue processing while the VDP cog executes)
Multiple cogs can interface independently with various chips - with enough pins, 2 VDPs can be driven independently etc.

Let's see how a sample command is executed, for example drawing a circle:

      CMD_DRAWCIRCLE:
        _drawCircle(long[plCommand][1], long[plCommand][2], long[plCommand][3], long[plCommand][4])
        '_logCommand(String("CMD_DRAWCIRCLE in mode "), _interval(cnt, timer))

Circle takes 4 parameters which are the coordinates of the center, radius, and color (which can be 0 or 1 in hi-res, or 0-3 in multicolor modes)

PRI _drawCircle(xc, yc, radius, color) |x, y, x2, y2, r2, x2m, pixCount
  '_logTrace(String("Drawing circle in color "), color, String(" at "), xc << 16 | yc , String(" with radius "), radius, 8)
  if (radius < 1)
    return 0
  pixCount := 0
  x := radius
  y := 0
  r2 := radius * radius
  x2 := r2
  y2 := 0
  repeat while (y =< x)
    pixCount += _drawPixel(xc + x, yc + y, color)
    pixCount += _drawPixel(xc + x, yc - y, color)
    pixCount += _drawPixel(xc - x, yc + y, color)
    pixCount += _drawPixel(xc - x, yc - y, color)
    pixCount += _drawPixel(xc + y, yc + x, color)
    pixCount += _drawPixel(xc + y, yc - x, color)
    pixCount += _drawPixel(xc - y, yc + x, color)
    pixCount += _drawPixel(xc - y, yc - x, color)
    y2 := y2 + y + y + 1
    y++
    x2m := x2 - x - x + 1
    if (_circleError(x2m, y2, r2) < _circleError(x2, y2, r2))
      x--
      x2 := x2m

On the bottom of the execution stack are the routines that drive the VDP signals in order to write command or data, or read status or data, including generating a reset:

{{ interfacing with VDP chip }}
PRI _readStatus
  return _vdpRead(1)

PRI _vdpRead(modeVal)
  if (modeVal == 0)             'only wait if reading from vdp memory, not status reg
    _waitForScan
  outa[MODE] := modeVal         'set mode
  outa[nCSW]~~                  'write inactive
  outa[nCSR]~~                  'read inactive
  dira[CD0 .. CD7]~             'data bus is input
  outa[nCSR]~                   'pulse nCSR

  result := ina[CD0 .. CD7]
  outa[nCSR]~~

PRI _vdpWrite(byteVal, modeVal)
  if (modeVal == 0)             'only wait if writing to vdp memory, not register
    _waitForScan
  outa[MODE] := modeVal         'set mode
  outa[nCSW]~~                  'write inactive
  outa[nCSR]~~                  'read inactive
  dira[CD0 .. CD7]~~            'data bus is output
  outa[CD0 .. CD7] := byteVal
  outa[nCSW]~                   'delay
  outa[nCSW]~~

PRI _vdpReset
  outa[nReset]~
  waitcnt((clkfreq / 2) + cnt) '500ms
  outa[nReset]~~

TMS9918_test.spin

In Propeller parlance, this is the "top level" object code, that is started up at boot time. Its purpose is to exercise various modes and options of the VDP to show its working on the screen. As a parameter, it takes the state of 4 switches on the Propeller demo board to either run all the demos or generate test picture to adjust the colors (== screwdriver and potentiometers!) or timings (== switches on FPGA board):

PUB Main | mode, rnd, switches
  waitcnt((clkfreq * 4) + cnt) 'wait 4s before start

  if vdp.Start(@CommandBuffer, vdp#GRAPHICS1, false, true)

    repeat true
      'read switches and if color is TRANSPARENT (== 0) continue with demo, otherwise show solid color screen
      dira[13..10]~ 'set as input
      switches := ina[13..10]
      if (switches < 8)
        vdp.Trace(String("Switches are in COLOR (< 8) mode, displaying 8 vertical color bars for calibration "), switches)
        vdp.SetMode(vdp#MULTICOLOR)
        _colorfulBlocks(byte[@ColorPalette8][switches])
      else
        if (switches > 8)
          vdp.Trace(String("Switches are in TICK (> 8) mode, tick lines (every 8 pixels) "), switches)
          vdp.SetMode(vdp#GRAPHICS2)
          _tickLines(byte[@ColorPalette8][switches - 8])
        else
          vdp.Trace(String("Switches are in DEMO (== 8) mode, all running demos "), switches)
          repeat mode from vdp#TEXT to vdp#GRAPHICS1

(demo cases)

Here is for example a demo that generated 8 sprites and sets them wandering across the screen in various directions:

PRI _spriteDemo(char, waitSecs) |dx, dy, i, rnd
  vdp.SetSpriteMode(vdp#SPRITESIZE_16X16 | vdp#SPRITEMAGNIFICATION_2X)
  repeat i from 0 to 7
    vdp.GenerateSpritePatternFromChar(@SpriteTestPattern16, char + i, 32)
    vdp.SetSpritePattern(i * 4, @SpriteTestPattern16, 32)
    vdp.SetSprite(i, vdp#SPRITEMASK_SETPATTERN | vdp#SPRITEMASK_SETCOLOR | vdp#SPRITEMASK_SETX | vdp#SPRITEMASK_SETY, i * 4, vdp.SpriteHPixelCount / 2 - 16, vdp.SpriteVPixelCount / 2 - 16, 15 - i)
  'give speed vectors to sprites and let send them off autonomously
  vdp.SetSprite(0, vdp#SPRITEMASK_VX | vdp#SPRITEMASK_VY, 0,  1,  0, 0)
  vdp.SetSprite(1, vdp#SPRITEMASK_VX | vdp#SPRITEMASK_VY, 0,  1, -1, 0)
  vdp.SetSprite(2, vdp#SPRITEMASK_VX | vdp#SPRITEMASK_VY, 0,  0, -1, 0)
  vdp.SetSprite(3, vdp#SPRITEMASK_VX | vdp#SPRITEMASK_VY, 0, -1, -1, 0)
  vdp.SetSprite(4, vdp#SPRITEMASK_VX | vdp#SPRITEMASK_VY, 0, -1,  0, 0)
  vdp.SetSprite(5, vdp#SPRITEMASK_VX | vdp#SPRITEMASK_VY, 0, -1,  1, 0)
  vdp.SetSprite(6, vdp#SPRITEMASK_VX | vdp#SPRITEMASK_VY, 0,  0,  1, 0)
  vdp.SetSprite(7, vdp#SPRITEMASK_VX | vdp#SPRITEMASK_VY, 0,  1,  1, 0)
  repeat waitSecs
    vdp.WaitASecond

It is interesting to note that vdp.SetSprite() function is executed by the "current cog", not the one driving the VDP. But the execution is really just preparing the command and parameters to be written to common RAM (all cogs share common RAM, accessed on round-robin basis), after which the SetSprite() function exists. The VDP cog then reads the command from common RAM and drives the sprite across the screen:

PRI _setSprite(spriteId, mask, patternId, x, y, color) |spriteAttributeAddress
  spriteAttributeAddress := SpriteAttributeTable + (spriteId << 2)
  _copyFromVdpMem(spriteAttributeAddress, @SpriteBuff, 4)
  '_logSprite(String("Sprite before "), spriteAttributeAddress, @SpriteBuff)
  if (mask & SPRITEMASK_SETY)
    byte[@SpriteBuff][0] := y
  else
    if (mask & SPRITEMASK_DY)
      byte[@SpriteBuff][0] += y
    else
      if (mask & SPRITEMASK_VY)
        byte[@spriteSpeed + (spriteId << 1)][1] := y
  if (mask & SPRITEMASK_SETX)
    byte[@SpriteBuff][1] := x
  else
    if (mask & SPRITEMASK_DX)
      byte[@SpriteBuff][1] += x
    else
      if (mask & SPRITEMASK_VX)
        byte[@spriteSpeed + (spriteId << 1)][0] := x
  if (mask & SPRITEMASK_SETPATTERN)
    byte[@SpriteBuff][2] := patternId
  if (mask & SPRITEMASK_SETCOLOR)
    byte[@SpriteBuff][3] := (byte[@SpriteBuff][3] & $F0) | (color & $0F)
  '_logSprite(String("Sprite after  "), spriteAttributeAddress, @SpriteBuff)
  _copyToVdpMem(spriteAttributeAddress, @SpriteBuff, 4)

Flash A/D converter for analog R, G, B

03/29/2021 at 04:04 • 0 comments

Unlike their TMS99X8 video display ancestors used in MSX (and many other home computers and game consoles), the Yamaha V9938 / V9958 VDPs generate analog R, G, B along with sync signals:

Variation	Output	Input	DRAM
TMS9918A	60Hz NTSC composite	60Hz NTSC composite	16k x 1bit
TMS9928A	60Hz YPbPr		16k x 1bit
TMS9929A	50Hz YPbPr		16k x 1bit
TMS9118	60Hz NTSC composite	60Hz NTSC composite	16k x 4bit
TMS9128	60Hz YPbPr		16k x 4bit
TMS9129	50Hz YPbPr		16k x 4bit

The voltage level on RGB outputs is in the following range:

The threshold voltage level must be set somewhere above VRGB0 and below VRGB7 - matched to the specific VDP driving the circuit.

To feed the FPGA with digital R, G, B, an A/D converter is needed. There are two main concerns here:

speed: the pixel clock is XTAL/4 = 21.47727/4 = 5.3693175MHz. This means the A/D conversion must complete in time much less than 185ns
resolution: the absolute minimum needed is 1 bit - color is present or not

One could of course use fast, high-precision, and expensive A/D converters. But for the proof of concept purposes, a super cheap voltage comparator circuit is sufficient:

When the voltage LM339 on + input is greater than - input, the output is "high" - meaning color is detected.

The voltage cutoff point is determined by running the demo code and and tweaking the potentiometer positions with a screwdriver until the colors looks acceptable:

The 1k pull-up resistors are pure ad-hoc improvisations too, prototyping the circuit on the breadboard I found that having them increases the picture quality, probably by generating faster output rise times.

Other signals are directly led from VDP to FPGA:

VIDEO_CSYNC - this signal contains both VSYNC and HSYNC components. The VSYNC is extracted in the FPGA from it. VSYNC frequency is 15.7kHz/262 = 60Hz.
VIDEO_HSYNC - positive pulse denotes start of new scan line. The frequency is XTAL/ 1368 = 15.7kHz
VDP_CPUCLK - this is XTAL/6 = 3.579545MHz signal. It is used to multiply with 12/2 in order to regenerate XTAL frequency inside the FPGA

Test rig
03/29/2021 at 04:02 • 0 comments

The sketch below describes key hardware components of this proof of concept:
Propeller proto-board
This board is out of production, but any proto-board with Propeller can be used. It is convenient that the number of signals that need to be driven is small: 8 data + 4 control lines only. So smaller boards with 16 connections to the breadboard are sufficient.
V9958 board
I used the high-quality kit board originally meant for rosco-m68k MC68000 computer. Few small hardware hacks were needed because the board adapter is set for MC68000 bus (J1), and Propeller allow direct interfacing with VDP, without glue logic. So I removed one GAL from the board, and connected the /RD and /WR signals directly, bypassing the Motorola bus R/nW logic.
I use the J2 output pins to tap into the VDP signals (not the DIN output)
Flash A/D board
This one is described separately, but is nothing more than 3 voltage comparators with potentiometers to tweak voltage cutoff separately for R, G, B and some pull up resistors on outputs. The result is RBG 3-bit digital color signal.
FPGA board
I used Mercury FPGA, a very convenient, economical and high quality board from MicroNova. Older Xilinx FPGA chip can be programmed using old but free ISE14.7 IDE, and the baseboard has VGA output. The signals are coming through PMOD. PMOD has 8 I/O pins, in this case 6 are used, 3 for RGB and 3 for control signals (HSYNC, CSYNC, CPU_CLOCK = XTAL/6)

Mode	Dual port RAM byte structure	Notes
RGB	0RGB0RGB	MSB is hard coded to 0
Color bus	c3c2c1c0c3c2c1c0	c3 = "i" signal c2 = pin 17 drives "R" input c1 = pin 18 drives "G" input c0 = pin 19 drives "B" input

RGB	color
000	BLACK
001	DARK BLUE
010	DARK GREEN
011	CYAN
100	DARK RED
101	MAGENTA
110	DARK YELLOW
111	WHITE

00	no color
01	color intensity low
10	(ignore, as should not occur: if the higher LM339 is over the threshold, lower must be too)
11	color intensity high

MSX2(+) video to VGA conversion (proof of concept)

Color bus hack

1. Atrocious hardware hack

2. Extending the FPGA pixel width from 3 to 4 bits

3. Color palette update

4. Test code update

5. Conclusion

Future improvements

Solutions for image sharpness

Solutions for color resolution