HLSL Syntax Highlighting for Sublime Text 3

Don’t worry, this’ll be a quick one.

If you write shaders, then you’ve certainly dealt with the issue of what to write them in.  Since there’s no true shader IDE (And seriously, why isn’t there?  They’re mainstream enough at this point that I can’t imagine it’s a niche market anymore, which is what I assume is the reason for ignoring it.), you find a text editor that’s lightweight and customizeable enough to help you get the job done as well as possible and you just deal with it.  And in a lot of cases, someone eventually writes a basic syntax highlighter for your editor of choice, and most of us are happy just to get anything that we embrace it and carry on.  I experienced it with NShader on Visual Studio, with MJP’s file for NotePad++, and with what I started with for Sublime Text 3 (sorry to whoever wrote it, I honestly don’t remember where it came from at this point).

In the year or so that I’ve been using ST3, I’ve found it easy enough to customize and extend it that I’ve been highly motivated to actually do those things.  Part of that work has been building my own syntax file for HLSL.  Now, my syntax file isn’t overly complex or even as complete as I’d like it to be, but… regex (Sidenote:  if someone much better at regex than me looks at this and laughs at my pitifulness, I fully welcome help making improvements!).  However, what I did was define function definitions and callsites, and that opens up an interesting possibility in ST3 beyond just more syntax highlighting; with that information, if you put your shader files into a sublime-project file, you get automatically hooked up to ST3’s Goto Definition framework.  Here’s what that looks like:

And, once you have this working, it’s a short jump to getting a library-wide autocomplete functioning.  Unfortunately, I cannot share my autocomplete indexing solution, but I can share my syntax file.  Here it is!  Hopefully you enjoy it, and again I fully welcome suggestions for improvements.  Also, here is my theme file, if you’re interested in a very HLSL-specific color coding, but it’s absolutely not required.

How I Spent My Summer Vacation

Last summer I got the opportunity to intern at Blizzard and while it’s something that I’ve mentioned, I haven’t really delved into the experience or what I worked on at all.  That is, until now!  Because features I worked on are now shipping and I am super stoked to see them and to share them.

Before I get started, I will say that in the interest of not violating anything I signed with Blizzard, I won’t be talking too deeply about technical implementation.  On the other hand, while I’m very proud of what I did during my internship, none of it is anything I would say is cutting edge and therefore it’s probably pretty easy to replicate without me walking you through it.

So, my internship was as an engine programmer on Team 1 (Starcraft 2/Heroes of the Storm), and my work was primarily in graphics.  I did a wide range of things; working on the core renderer, writing some HLSL, working new features into the art pipeline, and doing a lot of bug fixing.  It was a very rewarding experience, and an amazing time in my life.  One of the bigger features I implemented was occlusion based unit outlining, and it’s something that’s seeing pretty widespread usage in the Heroes technical alpha right now.

The concept here is pretty straight forward.  Designers wanted to be able to tag objects as occluders or occludees (or neither), and then have any portion of an occludee that’s occluded by an occluder be outlined.  And they wanted it done in team color.  Up to this point I hadn’t ever used the stencil buffer in my work, but I was familiar enough with it to see it as a logical choice to handle this.  Occluders and occludees were each masked with different bits of the stencil buffer, the blur would then mask a third stencil bit, and the final copy/resolve would only operate where there was an overlap of the occluder/blur bits and no overlap with the occludee bit.  That made the effect work, but there were a lot of additional considerations in regards to performance and how much it actually helped players that I won’t really be talking about.  And that may or may not be that interesting anyway.

Here’s a video where you can see my effect in action!

While I worked on a lot of other things during my internship, that was by far the biggest feature, and the one I can most readily point at and say, “I did that!”.  And it’s shipped.  I cannot stress enough how rad it is to see something I did be so prevalent in an amazing game like Heroes of the Storm.

So, that’s it for Blizzard work.  I have a number of posts to make about work on my own renderer, and those should be coming soon.  I made a lot of fixes, tweaks, and updates to shaders and general math that improved things a lot in the final month or so of the semester and I’m pretty excited to share them as well.  But, I’m also just getting started with summer classes to finish out my degree, so it might be a little while before I get them written up.  We’ll see!

I Have No Clue What I’m Doing! (Part 1 Of… A Lot)

So, this is a series that I imagine will continue for… ever.  But it will definitely be something that pops up a lot in the early post-school years.  And would have popped up a lot during school if I had thought to approach it this way at the time.  But I didn’t.  Yet another way in which I have no clue what I’m doing.  Haha?  But, in all seriousness, the idea of this post (and the many that will follow) will be to illustrate how things that I thought to be correct or good ideas earlier in the blog turned out to be very bad ideas after implementation and thorough testing and iteration happened.  Who would have guessed that I wasn’t going to get everything right the first time always?  …Everyone.

This first entry in the series is in regards to my entity transfer buffer, and my general threading model considerations, as described in a blog post from earlier this year.  In it, I devoted a bit of code to solving concurrency concerns as they related to updating renderable data from the core engine into my rendering library and to maintaining data that might be getting rendered after the core tried to delete it.  To the second point, more experience with how DirectX 11 command buffers store “in-flight” data quickly made it clear that I didn’t need to handle this at all.  To the first point, the code presented didn’t even fully solve the problem it was trying to address (the way I was buffering add/remove/update commands into the container still had data race issues), plus it introduced a memory leak in the buffered update, and in general presented a terrible calling convention where I expected the external caller to allocate a data struct that I would internally delete.  Just… truly horrific stuff.  But, on top of all of that, I’ve come to realize that a lot of this code was written to solve a problem I shouldn’t be trying to deal with in the first place.

Our engine was built as a synchronous core with potentially asynchronous systems.  At least for this project, attempting to write a thread-safe interface into my rendering library was absolutely overkill since there weren’t any threading related issue with transferring data from the core into my library.  By making my code only as complicated as it needed to be, rather than as complicated as it could be, I made it a lot more stable and functional.  Of course, this opens up another line of questioning; if I wanted to clean up my code and make my library publicly available, wouldn’t it be smart to have a thread-safe interface?  And to that, I’d say… maybe.  It might be nice, but unless you’re making a truly professional grade, absolutely everything must perform at maximum performance, tip top engine, I’m not sure that making the core architecture asynchronous is a great idea.  For smaller scale engines like the one that we built this year, it makes a lot more sense to keep the core simple and let each system handle it’s own threading as it sees fit.  You still get a lot of good performance and generally decent scalability this way, without all of the headache and hassle of managing an entire engine’s worth of thread ordering, syncing, etc.

In the end, this is what my TransferBuffer class ended up being:


template <typename t_entry, typename t_data, typename t_id>
class TransferBuffer
{
typedef typename std::unordered_map<t_id, t_entry>::iterator t_iterator;
private:
 std::unordered_map<t_id, t_entry> m_entries;
 t_id m_nextID;
public:
 TransferBuffer() {
   m_nextID = 0;
 }

 ~TransferBuffer() {
 }

 //This should only ever be called by the game engine!
 t_id AddEntry(t_data pData) {
   t_id lReturnID = m_nextID++;

   t_entry lEntry = pData->CreateEntity();
   m_entries.emplace(lReturnID, lEntry);

   return lReturnID;
 }

 //This should only ever be called by the game engine!
 void RemoveEntry(t_id pID) {
   auto lIter = m_entries.find(pID);

   if (lIter != m_entries.end())
   {
     t_entry lEntry = lIter->second;
     delete lEntry;
     m_entries.erase(lIter);
   }
 }

 //This should only ever be called by the game engine!
 void UpdateEntry(t_id pID, t_data pData) {
   auto lIter = m_entries.find(pID);

   if (lIter != m_entries.end())
   {
     lIter->second->UpdateData(pData);
   }
 }

 //This should only ever be called by parallel producer threads!
 t_iterator GetEntries() {
   return m_entries.begin();
 }

 t_iterator GetEnd() {
   return m_entries.end();
 }

 //This should only be called from the interface to expose mesh data for physics!
 t_entry GetFromID(t_id pID) {
   auto lIter = m_entries.find(pID);
   t_entry lEntry = nullptr;

   if (lIter != m_entries.end())
   {
     lEntry = lIter->second;
   }

   return lEntry;
 }
};

And then, of course, this class didn’t really need to be a template at all.  Since all entity and entitydata objects inherit from the same base IEntity and EntityData types, I was able to make an array of TransferBuffer<IEntity*, EntityData*, unsigned> to store my various entity types.  And this allowed me to remove gross switch statements from each operation that my entity manager had to perform, instead accessing the array based on an enum that defined index by entity type.  So, in the end, a lot less code, a lot more stability, and nothing really lost in the translation.

And, that’s it for the first installment of me being incredibly wrong about things.  In other news, I gave a talk about my actual, finished multi-threaded rendering system at Game Engine Architecture Club last month, so once that video gets uploaded to YouTube expect a post about it with links to my slides.  Also, I recently got into the Heroes of the Storm tech alpha and was delighted to see features that I wrote last summer in heavy use in the game (!!!), so also expect a post about that in the very near future.  Stay tuned for those updates; otherwise, it’s the final push through finals and into graduation, followed by plenty of sleep!

Why I Love DirectX Shader Reflection. Also, Why I Hate DirectX Shader Reflection.

Based on the title, it should be obvious that this isn’t the updated talk on my multithreaded renderer I mentioned last post. That whole write up is still in the works, but today I thought I’d jump into D3D Shader Reflection, and how it both great and worth implementing, and completely ruined by it’s limitations.

So, my shader manager has undergone very little change since I first implemented it last year for Evac.  It’s a very simple class, just designed to hold all of my compiled shaders, keep track of the last shader used, and allow materials to switch shaders on demand.  Very bare bones, but very functional.  The biggest problem with it has been that I have to manually do too many things when I want to add a new shader; I have to write the HLSL, I have to add an entry to my shaders enum, I have to write a new C++ class for the shader, and then I have to update the shader manager since each shader having it’s own class means I have to explicitly create and destruct the new types.  Ideally, I should only need to write the HLSL and update the enum, and everything else could happen automatically.

When I was rebuilding my framework this year, this was an area I investigated for improvement.  I was able to refactor my shader class code so that 95% of it was generic and capable of being a single class that could service all my compiled shaders.  The problem was the input layout generation.  Since that was something that had to be defined per vertex fragment (or at least, unique vertex input struct), I had to have a way to generate this properly for each shader, and it lead to still having a C++ class per shader even if 95% of it was copy/paste and the only unique code between them was the input layout declaration.  An improvement to be sure, but not nearly enough of one.

Now, I’ve been aware of, and interested by, the shader reflection system provided by D3D for a while, but I’ve always considered the time commitment to research, implement, and fix to not be worth it when I already had a working, albeit slightly tedious, shader system.  This week finally tipped the scales because I found myself avoiding trying to implement something via a new shader because I didn’t want to go through the hassle of the whole process if I didn’t have to.  So, I took the plunge.

Before I get into the source code, there are two things worth sharing.  The first is that I am using the shader reflection system solely to generate my input layout from my HLSL in an effort to create a single, generic shader class; the entirety of the system is very powerful and can do a lot more than the small subsection that I’m discussing here.  The second is that I took the basis of my implementation from this post by Bobby Anguelov, and it probably is worth a read if this is interesting to you at all.  With that said, here’s the function I wrote that generates my input layouts:


void CompiledShader::CreateVertexInputLayout(ID3D11Device* pDevice, ShaderBytecode* pBytecode, const char* pFileName)
{
 ID3D11ShaderReflection* lVertexShaderReflection = nullptr;
 if (FAILED(D3DReflect(pBytecode->bytecode, pBytecode->size, IID_ID3D11ShaderReflection, (void**) &lVertexShaderReflection)))
 {
  return;
 }

 D3D11_SHADER_DESC lShaderDesc;
 lVertexShaderReflection->GetDesc(&lShaderDesc);

 std::ifstream lStream;
 lStream.open(pFileName, std::ios_base::binary);
 bool lStreamIsGood = lStream.is_open();
 unsigned lLastInputSlot = 900;

 std::vector<D3D11_INPUT_ELEMENT_DESC> lInputLayoutDesc;
 for (unsigned lI = 0; lI < lShaderDesc.InputParameters; lI++)
 {
  D3D11_SIGNATURE_PARAMETER_DESC lParamDesc;
  lVertexShaderReflection->GetInputParameterDesc(lI, &lParamDesc);

  D3D11_INPUT_ELEMENT_DESC lElementDesc;
  lElementDesc.SemanticName = lParamDesc.SemanticName;
  lElementDesc.SemanticIndex = lParamDesc.SemanticIndex;

  if (lStreamIsGood)
  {
   lStream >> lElementDesc.InputSlot;
   lStream >> reinterpret_cast<unsigned&>(lElementDesc.InputSlotClass);
   lStream >> lElementDesc.InstanceDataStepRate;
  }
  else
  {
   lElementDesc.InputSlot = 0;
   lElementDesc.InputSlotClass = D3D11_INPUT_PER_VERTEX_DATA;
   lElementDesc.InstanceDataStepRate = 0;
  }

  lElementDesc.AlignedByteOffset = lElementDesc.InputSlot == lLastInputSlot ? D3D11_APPEND_ALIGNED_ELEMENT : 0;
  lLastInputSlot = lElementDesc.InputSlot;

  if (lParamDesc.Mask == 1)
  {
   if (lParamDesc.ComponentType == D3D_REGISTER_COMPONENT_UINT32) lElementDesc.Format = DXGI_FORMAT_R32_UINT;
   else if (lParamDesc.ComponentType == D3D_REGISTER_COMPONENT_SINT32) lElementDesc.Format = DXGI_FORMAT_R32_SINT;
   else if (lParamDesc.ComponentType == D3D_REGISTER_COMPONENT_FLOAT32) lElementDesc.Format = DXGI_FORMAT_R32_FLOAT;
  }
  else if (lParamDesc.Mask <= 3)
  {
   if (lParamDesc.ComponentType == D3D_REGISTER_COMPONENT_UINT32) lElementDesc.Format = DXGI_FORMAT_R32G32_UINT;
   else if (lParamDesc.ComponentType == D3D_REGISTER_COMPONENT_SINT32) lElementDesc.Format = DXGI_FORMAT_R32G32_SINT;
   else if (lParamDesc.ComponentType == D3D_REGISTER_COMPONENT_FLOAT32) lElementDesc.Format = DXGI_FORMAT_R32G32_FLOAT;
  }
  else if (lParamDesc.Mask <= 7)
  {
   if (lParamDesc.ComponentType == D3D_REGISTER_COMPONENT_UINT32) lElementDesc.Format = DXGI_FORMAT_R32G32B32_UINT;
   else if (lParamDesc.ComponentType == D3D_REGISTER_COMPONENT_SINT32) lElementDesc.Format = DXGI_FORMAT_R32G32B32_SINT;
   else if (lParamDesc.ComponentType == D3D_REGISTER_COMPONENT_FLOAT32) lElementDesc.Format = DXGI_FORMAT_R32G32B32_FLOAT;
  }
  else if (lParamDesc.Mask <= 15)
  {
   if (lParamDesc.ComponentType == D3D_REGISTER_COMPONENT_UINT32) lElementDesc.Format = DXGI_FORMAT_R32G32B32A32_UINT;
   else if (lParamDesc.ComponentType == D3D_REGISTER_COMPONENT_SINT32) lElementDesc.Format = DXGI_FORMAT_R32G32B32A32_SINT;
   else if (lParamDesc.ComponentType == D3D_REGISTER_COMPONENT_FLOAT32) lElementDesc.Format = DXGI_FORMAT_R32G32B32A32_FLOAT;
  }

  lInputLayoutDesc.push_back(lElementDesc);
 }

 lStream.close();

 pDevice->CreateInputLayout(&lInputLayoutDesc[0], lShaderDesc.InputParameters, pBytecode->bytecode, pBytecode->size, &m_layout);

 lVertexShaderReflection->Release();
}

Now, some places where I differ from Bobby’s implementation.

The easiest is the AlignedByteOffset.  He keeps track of how many bytes each parameter of the input struct takes up to calculate this as he goes.  However, the first element of a given input slot is always zero, and any following element in the same input slot can be given D3D11_APPEND_ALIGNED_ELEMENT and get the correct result.  A small difference, and his version works, but this is just simpler code and less prone to ever potentially being a headache.  Also, I suppose that I’m making some assumptions here that you’re not doing weird packing on your input structs that could otherwise break my code, but it’d also break Bobby’s so I don’t feel too bad about it.  You’ll also notice my lLastInputSlot variable, and how it starts at 900 and probably think that’s weird.  The API documentation says that valid values for input slots are 0 – 15, and I needed a value that ensures that the first element in slot 0 properly gets an offset of 0, so this was a way to do that.  Any value > 15 would work.  I picked 900 for no good reason.

Now we start to get into the territory that infuriates me about D3D shader reflection.  And the worst part is that I understand why this limitation exists, and I understand it’s reasonable for this to be the way it is, but I’m mad that I can’t write code that fully automates this process and allows me to be as lazy as I want to be.  I am referring to the unholy trio of InputSlot, InputSlotClass, and InstanceDataStepRate.  These are related fields, and if you don’t ever do anything with combining multiple input streams you can safely default these to 0, D3D11_INPUT_PER_VERTEX_DATA, and 0 and live a happy, carefree life.  However, if you’re doing any batching through input stream combining, this becomes a very different, and annoying story.

See, the reflection system is able to glean every other necessary piece of data from your HLSL because it directly exists in your HLSL; that’s how reflection works.  However, there is nothing there to denote what input slot a parameter belongs to, if it’s per_vertex or per_instance data, or what the step rate is if it’s per_instance data.  There aren’t even any optional syntax keywords to give the reflection system hints at what you want, which would be acceptable for making this work.  Instead, you get nothing!  So, my solution was to create a small metadata file for each vertex shader file that just denotes input slot, input slot class, and instance data step rate per input struct parameter.  If you do the same thing, it’s worth noting that D3D11_INPUT_PER_VERTEX_DATA is 0 and D3D11_INPUT_PER_INSTANCE_DATA is 1.

So, I no longer have to create a C++ class per shader, and my shader manager automatically handles new shaders based on additions to my shaders enum, but I do have to create this metadata file per vertex fragment.  It’s sub-optimal to be sure, but definitely still a huge win over the old system.  And I was able to do the whole overhaul in a day.  So, if you’re in anywhere near the same boat I was, I’d definitely recommend looking into the shader reflection system.

However, there is one last caveat.  It doesn’t really matter to me as far as shipping my game this year, but I could see it being a pain at some point, and probably to other people.  By using the D3DReflect function, you cannot pass validation for the Windows App Store.  Microsoft has an insane plan wherein you cannot compile or reflect your shaders at runtime at all.  I understand the logic here, but I also can’t help but think this undermines what I think is great about the reflection system.  I was able to put in relatively minimal effort and reap a huge benefit.  If I wanted to bring my game “up to code” to pass app store validation, I would need to put in a lot more work to reflect my input layout into a file during compile time and then load that during runtime.  It’s not undoable by any means, but it really forces you to dive into the deep end or not at all as far as the tools they’re providing here.  I guess I’m just not a fan.

Anyway, that’s that.  Hopefully you found something in all of this useful.  And at some point soon, I really will write about my finished multithreaded renderer.  But there may be a post or two before that happens.  We’ll see.

The Shift To GGX And A Glimpse Of Things To Come

I’m just going to jump into things and not really explain why there’s been a 4 month gap since the last post.  It should be pretty sufficient to just say… school.  However, I now have a lot of stuff to post about, so I’m hoping that means a lot of posts in quick succession.  And I think a lot of them are going to be a lot more directly technical in nature, complete with code samples.  We’ll see how that actually shakes out.

So, as mentioned in my last post, one of the things I wanted to do this year is move towards a more physically based system for rendering.  Now, I might have pipe dreams of implementing global illumination and a full microfacet model and really doing this thing right, but let’s be realistic.  I only get so much time around the rest of my classes and staying sane, and it can’t all be spent on my lighting pipeline.  I have to deliver features and tools so my team can make the game we’re making, and when it comes down to it our game could ship just fine on last year’s lighting model if it had to.  Luckily, short of implementing CryEngine there were a number of much smaller things I could accomplish that made significant impact to both the visual fidelity of the system as well as the ability for artists to more properly represent a wider range of materials.  The first of these was to change the distribution function for my specular calculation.

Now a little back story.  Over the summer, Blizzard sent all of their graphics programmers to SIGGRAPH, and they were gracious enough to have that include me despite just being an intern.  While there, I attended the Physically Based Shading course, and it presented me with a lot of information that I hadn’t necessarily previously considered, but which made a lot of sense as soon as I saw it; it was basically my experience at GDC last year, all over again.  All of the talks brought a lot of useful information, but Brian Karis’ talk about UE4’s rendering system ended up being the true catalyst of that morning.  I did a lot more research on my own and ended up replacing my Blinn distribution function with GGX, and the results were pretty astonishing despite not adopting any other part of the Cook-Torrance microfacet model.  This change also rendered the specular texture completely unnecessary; I ended up repurposing those channels to store material properties like roughness, and it opened up a lot more possibilities as far as having varied materials across a single model.

Here is a before and after comparison of the distribution functions:

Blinn

GGX

Now, as a programmer, I am perfectly capable of designing and implementing this system.  However, I am no artist, and Photoshop is definitely not my friend.  So, the results here are based on very terrible texture manipulation by me, and you might be scratching your head over how much of an improvement this really was.  But I still think it proves my point well enough.  In the top image, the entire car looks plasticy, which is pretty common for Blinn.  In the bottom, the highlight tails vary, but overall feel much warmer.  The yellowness of the near pointlight also has more effect on the objects in the scene in the GGX version.  And due to storing material properties in textures, the GGX version is also able to present the canopy as shiny while the body is much more dull.

So, while this is certainly still not a fully physically based system, I was able to make significant improvements relatively quickly, and maintain near equal computational complexity.  I call that a win.  Next time, I plan to delve into my finished multithreaded renderer, and talk about how incredibly wrong some elements of the code I previously posted in regards to it were.  So, it’ll be fun.  Especially for me.  Look forward to it!